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Abstract 



We show that one can approximate the least fixed point solution for a multivariate sys- 
tem of monotone probabilistic max (min) polynomial equations, referred to as maxPPSs (and 
minPPSs, respectively), in time polynomial in both the encoding size of the system of equa- 
tions and in log(l/e), where e > is the desired additive error bound of the solution. (The 
model of computation is the standard Turing machine model.) We establish this result using a 
generalization of Newton's method which applies to maxPPSs and minPPSs, even though the 
underlying functions are only piecewise-differentiable. This generalizes our recent work which 
provided a P-time algorithm for purely probabilistic PPSs. 

These equations form the Bellman optimality equations for several important classes of 
infinite- state Markov Decision Processes (MDPs). Thus, as a corollary, we obtain the first 
polynomial time algorithms for computing to within arbitrary desired precision the optimal 
value vector for several classes of infinite-state MDPs which arise as extensions of classic, and 
heavily studied, purely stochastic processes. These include both the problem of maximizing and 
mininizing the termination (extinction) probability of multi-type branching MDPs, stochastic 
context-free MDPs, and 1-exit Recursive MDPs. 

Furthermore, we also show that we can compute in P-time an e-optimal policy for both 
maximizing and minimizing branching, context-free, and 1-exit-Recursive MDPs, for any given 
desired e > 0. This is despite the fact that actually computing optimal strategies is Sqrt-Sum- 
hard and PosSLP-haid in this setting. 

We also derive, as an easy consequence of these results, an FNP upper bound on the complex- 
ity of computing the value (within arbitrary desired precision) of branching simple stochastic 
games (BSSGs) and related infinite-state turn-based stochastic game models. 

1 Introduction 

Markov Decision Processes (MDPs) are a fundamental model for stochastic dynamic optimization 
and optimal control, with applications in many fields. They extend purely stochastic processes 
(Markov chains) with a controller (an agent) who can partially affect the evolution of the process, 
and seeks to optimize some objective. For many important classes of MDPs, the task of computing 
the optimal value of the objective, starting at any state of the MDP, can be rephrased as the problem 
of solving the associated Bellman optimality equations for that MDP model. In particular, for finite- 
state MDPs where, e.g., the objective is to maximize (or minimize) the probability of eventually 
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reaching some target state, the associated Bellman equations are max-(min-)linear equations, and 
we know how to solve such equations in P-time using linear programming (see, e.g., |20|). The same 
holds for a number of other classes of finite-state MDPs. 

In many important settings however, the state space of the processes of interest, both for purely 
stochastic processes, as well as for controlled ones (MDPs), is not finite, even though the processes 
can be specified in a finite way. For example, consider multi-type branching processes (BPs) |18^I16). 
a classic probabilistic model with applications in many areas (biology, physics, etc.). A BP models 
the stochastic evolution of a population of entities of distinct types. In each generation, every entity 
of each type T produces a set of entities of various types in the next generation according to a 
given probability distribution on offsprings for the type T. In a Branching Markov Decision Process 
(BMDP) |19[ 121]. there is a controller who can take actions that affect the probability distribution 
for the sets of offsprings for each entity of each type. For both BPs and BMDPs, the state space 
consists of all possible populations, given by the number of entities of the various types, so there 
are an infinite number of states. From the computational point of view, the usefulness of such 
infinite-state models hinges on whether their analysis remains tractable. 

In recent years there has been a body of research aimed at studying the computational com- 
plexity of key analysis problems associated with MDP extensions (and, more general stochastic 
game extensions) of important classes of finitely-presented but countably infinite- state stochastic 
processes, including controlled extensions of classic multi-type branching processes (i.e., BMDPs), 
and stochastic context-free grammars, and discrete-time quasi-birth- death processes. In [14] a model 
called recursive Markov decision processes (RMDP) was studied that is in a precise sense more 
general than all of these, and forms the MDP extension of recursive Markov chains |15| (and equiv- 
alently, probabilistic pushdown systems |10]). or it can be viewed alternatively as the extension of 
finite-state MDPs with recursion. 

A central analysis problem for all of these models, which forms the key to a number of other 
analyses, is the problem of computing their optimal termination (extinction) probability. For exam- 
ple, in the setting of multi-type Branching MDPs (BMDPs), these key quantities are the maximum 
(minimum) probabilities, over all control strategies (or policies), that starting from a single entity of 
a given type, the process will eventually reach extinction (i.e., the state where no entities have sur- 
vived). From these quantities, one can compute the optimum probability for any initial population, 
as well as other quantities of interest. 

One can indeed form Bellman optimality equations for the optimal extinction probabilities of 
BMDPs, and for a number of related important infinite-state MDP models. However, it turns out 
that these optimality equations are no longer max/min linear but rather are max/min polynomial 
equations (|14|). Specifically, the Bellman equations for BMDPs with the objective of maximizing (or 
minimizing) extinction probability are multivariate systems of monotone probabilistic max (or min) 
polynomial equations, which we call max/minPPSs, of the form = Pi(x\, . . . , x n ), i = 1, . . . , n, 
where each P{{x) = max,,- qij(x) (respectively Pi(x) = minj qij(x)) is the max (min) over a finite 
number of probabilistic polynomials, qij(x). A probabilistic polynomial, q(x), is a multi-variate 
polynomial where the monomial coefficients and constant term of q(x) are all non-negative and 
sum to < 1. We write these equations in vector form as x = P(x). Then P(x) defines a mapping 
P : [0, l] n —> [0, l] n that is monotone, and thus (by Tarski's theorem) has a least fixed point in 
[0, l] n . The equations x = P(x), can have more than one solution, but it turns out that the optimal 
value vector for the corresponding BMDP is precisely the least fixed point (LFP) solution vector 
q* € [0, l] n , i.e., the (coordinate-wise) least non-negative solution (|14|). 
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Already for pure stochastic multi-type branching processes (BPs), the extinction probabilities 
may be irrational values. The problem of deciding whether the extinction probability of a BP is > p, 
for a given probability p is in PSPACE (|15|). and likewise, deciding whether the optimal extinction 
probability of a BMDP is > p is in PSPACE (|14|). These PSPACE upper bounds appeal to decision 
procedures for the existential theory of reals for solving the associated (max/min)PPS equations. 
However, already for BPs, it was shown in |15| that this quantitative decision problem is already 
at least as hard as the square-root sum problem, as well as a (much) harder and more fundamental 
problem called PosSLP, which captures the power of unit-cost exact rational arithmetic. It is a long- 
standing open problem whether either of these decision problems is in NP, or even in the polynomial 
time hierarchy (see [HQS] for more information on these problems). Thus, such quantitative decision 
problems are unlikely to have P-time algorithms, even in the purely stochastic setting, so we can 
certainly not expect to find P-time algorithms for the extension of these models to the MDP setting. 
On the other hand, it was shown in |15| and |14j . that for both BPs and BMDPs the qualitative 
decision problem of deciding whether the optimal extinction probability q* = or whether q* = 1, 
can be solved in polynomial time. 

Despite decades of theoretical and practical work on computational problems like extinction 
relating to multi-type branching processes, and equivalent termination problems related to stochas- 
tic context-free grammars, until recently it was not even known whether one could obtain any 
non-trivial approximation of the extinction probability of a purely stochastic multi-type branching 
processes (BP) in P-time. The extinction probabilities of pure BPs are the LFP of a system of 
probabilistic polynomial equations (PPS), without max or min. In recent work [11], we provided 
the first polynomial time algorithm for computing (i.e., approximating) to within any desired addi- 
tive error e > the LFP of a given PPS, and hence the extinction probability vector q* for a given 
pure stochastic BP, in time polynomial in both the encoding size of the PPS (or the BP) and in 
log(l/e). The algorithm works in the standard Turing model of computation. Our algorithm was 
based on an approach using Newton's method that was first introduced and studied in [15] . In |15| 
the approach was studied for more general systems of monotone polynomial equations (MPSs), and 
it was subsequently further studied in [9]. 

Note that unlike PPSs and MPSs, the min/maxPPSs that define the Bellman equations for 
BMDPs are no longer differentiable functions (they are only piecewise differentiable) . Thus, a 
priori, it is not even clear how one could apply a Newton-type method toward solving them. 

In this paper we extend the results of |11| . and provide the first polynomial time algorithms 
for approximating the LFP of both maxPPSs and minPPSs, and thus the first polynomial time 
algorithm for computing (to within any desired additive error) the optimal value vector for BMDPs 
with the objective of maximizing or minimizing their extinction probability. 

Our approach is based on a generalized Newton's method (GNM), that extends Newton's method 
in a natural way to the setting of max/minPPSs, where each iteration requires the computation of 
the least (greatest) solution of a max- (min-) linear system of equations, both of which we show can 
be solved using linear programming. Our approach also makes crucial use of the P-time algorithms 
in |14| for qualitative analysis of max/min BMDPs, which allow us to remove variables Xi where 
the LFP is q* = 1 or where q* = 0. The algorithms themselves have the nice feature that they are 
relatively simple, although the analysis of their correctness and time complexity is rather involved. 

We furthermore show that we can compute e-optimal (pure) strategies (policies) for both maxPPSs 
and minPPSs, for any given desired e > 0, in time polynomial in both the encoding size of the 
max/minPPS and in log(l/e). This result is at first glance rather surprising, because there are only 
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a bounded number of distinct pure policies for a max/minPPS, and computing an optimal policy 
is PosSLP-hard. The proof of this result involves an intricate analysis of bounds on the norms of 
certain matrices associated with (max/min)PPSs. 

Finally, we consider Branching simple stochastic games (BSSGs), which are two-player turn- 
based stochastic games, where one player wants to maximize, and the other wants to minimize, the 
extinction probability (see |14j). The value of these games (which are determined) is characterized 
by the LFP solution of associated min-maxPPSs which combine both min and max operators (see 
|14|). We observe that our results easily imply a FNP upper bound for e-approximating the value 
of BSSGs and computing e-optimal strategies for them. 

Related work: We have already mentioned some of the important relevant results. BMDPs and 
related processes have been studied previously in both the operations research (e.g. [19} [21~j [7]) 
and computer science literature (e.g. |14[ [HJ H]), but no efficient algorithms were known for the 
(approximate) computation of the relevant optimal probabilities and policies; the best known upper 
bound was PSPACE [H]. 

In |14| we introduced Recursive Markov Decision Processes (RMDPs), a recursive extension of 
MDPs. We showed that for general RMDPs, the problem of computing the optimal termination 
probabilities, even within any nontrivial approximation, is undecidable. However, we showed for the 
important class of 1-exit RMDPs (1-RMDP), the optimal probabilities can be expressed by min (or 
max) PPSs, and in fact the problems of computing (approximately) the LFP of a min/maxPPS and 
the termination probabilities of a max/min 1-RMDP, or BMDP, are all polynomially equivalent. 
We furthermore showed in |14| that there are always pure, memoryless optimal policies for both 
maximizing and minimizing 1-RMDPs (and for the more general turn-based stochastic games). 

In |12j . 1-RMDPs with a different objective were studied, namely optimizing the total expected 
reward in a setting with positive rewards. In that setting, things are much simpler: the Bellman 
equations turn out to be max/min- linear, the optimal values are rational, and they can be computed 
exactly in P-time using linear programming. 

A work that is more closely related to this paper is [8] by Esparza, Gawlitza, Kiefer, and 
Seidl. They studied more general monotone min-maxMPSs, i.e., systems of monotone polynomial 
equations that include both min and max operators, and they presented two different iterative 
analogs of Newton's methods for approximating the LFP of a min-maxMPS, x = P(x). Their 
methods are related to ours, but differ in key respects. Both of their methods use certain piece-wise 
linear functions to approximate the min-maxMPS in each iteration, which is also what one does 
to solve each iteration of our generalized Newton's method. However, the precise nature of their 
piece-wise linearizations, as well as how they solve them, differ in important ways from ours, even 
when they are applied in the specific context of maxPPSs or minPPSs. They show, working in the 
unit-cost exact arithmetic model, that using their methods one can compute j "valid bits" of the 
LFP (i.e., compute the LFP within relative error at most 2~ J ) in kp + cp ■ j iterations, where kp 
and cp are terms that depend in some way on the input system, x = P(x). However, they give no 
constructive upper bounds on kp, and their upper bounds on cp are exponential in the number n of 
variables of x = P{x). Note that MPSs are more difficult: even without the min and max operators, 
we know that it is PosSLP-hard to approximate their LFP within any nontrivial constant additive 
error c < 1/2, even for pure MPSs that arise from Recursive Markov Chains [15] . 

Another subclass of RMDPs, called one-counter MDPs (a controlled extension of one-counter 
Markov chains and Quasi-Birth-Death processes [13J) has been studied, and the approximation of 
their optimal termination probabilities was recently shown to be computable, but only in expo- 
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nential time (|3j). This subclass is incomparable with 1-RMDPs and BMDPs, and does not have 
min/maxPPSs as Bellman equations. 

2 Definitions and Background 

For an n-vector of variables x = (a?i, . . . ,x n ), and a vector v G N n , we use the shorthand notation 
x v to denote the monomial x^ 1 . . . x v ^ . Let (a r G N n | r 6 R) be a multi-set of n- vectors of natural 
numbers, indexed by the set R. Consider a multi-variate polynomial Pi(x) = ^2 r& jiPrX ar , for some 
rational- valued coefficients p r , r G R. We shall call Pi{x) a monotone polynomial if p r > 
for all r £ fi, If in addition, we also have ^2 r& ^Pr < L then we shall call Pi(x) a probabilistic 
polynomial. 

Definition 2.1. yl probabilistic (respectively, monotone,) polynomial system of equations, 

x = P{x), which we shall call a PPS (respectively, a MPSJ, is a system of n equations, Xi = Pi{x), 
in n variables x = {x\,X2i ■■■,x n ), where for all i G {1,2, ...n}, Pi{x) is a probabilistic (respectively, 
monotone) polynomial. 

A maximum-minimum probabilistic polynomial system of equations, x = P{x), called 
a max-minPPS is a system of n equations in n variables x = (xi,X2, ■ ■ ■ ,x n ), where for all 
i G {1, 2, . . . , n}, either: 

• Max-polynomial: Pi(x) = max{(/jj(x) : j G {1, ...,mj}}, Or: 

• Min-polynomial: Pi(x) = min{(/jj(x) : j G {1, ...,mj}} 

where each qi y j(x) is a probabilistic polynomial, for every j G {1, . . . ,nii}. 

We shall call such a system a maxPPS (respectively, a minPPSJ if for every i G {1, . . . ,n}, 
Pi(x) is a Max-polynomial (respectively, a Min-polynomial ). 

Note that we can view a PPS in n variables as a maxPPS, or as a minPPS, where = 1 for 
every i G {1, . . . , n}. 

For computational purposes we assume that all the coefficients are rational. We assume that 
the polynomials in a system are given in sparse form, i.e., by listing only the nonzero terms, with 
the coefficient and the nonzero exponents of each term given in binary. We let \P\ denote the total 
bit encoding length of a system x = P{x) under this representation. 

We use max/minPPS to refer to a system of equations, x = P(x), that is either a maxPPS or 
a minPPS. While |14j also considered systems of equations containing both max and min equations 
(which we refer to as max-minPPSs), our primary focus will be on systems that contain just one 
or the other. (But we shall also obtain results about max-minPPSs as a corollary.) 

As was shown in [14] , any max-minPPS, x = P(x), has a least fixed point (LFP) solution, 
q* G [0, l] n , i.e., q* = P(q*) and if q = P(q) for some q G [0, l] n then q* < q (coordinate-wise 
inequality). As observed in |15[ 114] . q* may in general contain irrational values, even in the case 
of PPSs. The central results of this paper yield P-time algorithms for computing q* to within 
arbitrary precision, both in the case of maxPPSs and minPPSs. As we shall explain, our P-time 
upper bounds for computing (to within any desired accuracy) the least fixed point of maxPPSs and 
minPPSs will also yield, as corollaries, FNP upper bounds for computing approximately the LFP 
of max-minPPSs. 
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Definition 2.2. We define a policy for a max/minPPS, x = P{x), to be a function a : {1, ...n} — > 
N such that 1 < a(i) < m*. 

Intuitively, for each variable, Xj, a policy selects one of the probabilistic polynomials, <?i )(T (i) (x) , 
that appear on the RHS of the equation Xj = Pi(x), and which P{{x) is the maximum/minimum 
over. 

Definition 2.3. Given a max/minPPS x = P{x) over n variables, and a policy a for x = P{x), 
we define the PPS x = P a {x) by: 

{P a )i{x) = q i)(T (i) 

for all i € {1, . . . , n}. 

Obviously, since a PPS is a special case of a max/minPPS, every PPS also has a unique LFP 
solution (this was established earlier in |15|). Given a max/minPPS, x = P(x), and a policy, a, we 
use q* to denote the LFP solution vector for the PPS x = P a (x). 

Definition 2.4. For a maxPPS, x = P{x), a policy a* is called optimal if for all other policies 
o~, q** > q* a . For a minPPS x = P(x) a policy a* is optimal if for all other policies a, q** < q*. A 
policy a is e-optimal for e > if \ \q* — q*\\oo < e. 

A non-trivial fact is that optimal policies always exist, and furthermore that they actually attain 
the LFP q* of the max/minPPS: 

Theorem 2.5 (|14|. Theorem 2). For any max/minPPS, x = P{x), there always exists an optimal 
policy a* , and furthermore q* = q*, 

Probabilistic polynomial systems can be used to capture central probabilities of interest for sev- 
eral basic stochastic models, including Multi-type Branching Processes (BP), Stochastic Context- 
Free Grammars (SCFG) and the class of 1-exit Recursive Markov Chains (1-RMC) |15| . Max- 
and minPPSs can be similarly used to capture the central optimum probabilities of correspond- 
ing stochastic optimization models: (Multi-type) Branching Markov Decision processes (BMDP), 
Context-Free MDPs (CF-MDP), and 1-exit Recursive Markov Decision Processes (1-RMDP) [H]. 
We now define BMDPs and 1-RMDPs. 

A Branching Markov Decision Process (BMDP) consists of a finite set V = {Ti, . . . ,T„} 
of types, a finite set A{ of actions for each type, and a finite set R(Ti,a) of probabilistic rules for 
each type Tj and action aj € A4. Each rule r E R(Ti, a) has the form Tj ^ a r , where a r is a finite 
multi-set whose elements are in V, p r £ (0, 1] is the probability of the rule, and the sum of the 
probabilities of all the rules in i?(T,, a) is equal to 1: X^rei?(T a) P r = ^■ 

Intuitively, a BMDP describes the stochastic evolution of entities of given types in the presence of 
a controller that can influence the evolution. Starting from an initial population (i.e. set of entities 
of given types) Xq at time (generation) 0, a sequence of populations X\, X2, ... is generated, where 
Xk is obtained from X^i as follows. First the controller selects for each entity of X^-i an available 
action for the type of the entity; then a rule is chosen independently and simultaneously for every 
entity of Xk_i probabilistically according to the probabilities of the rules for the type of the entity 
and the selected action, and the entity is replaced by a new set of entities with the types specified 

1 Theorem 2 of |14| is stated in the more general context of 1-exit Recursive Simple Stochastic Games and shows 
that also for max-minPPSs, both the max player and the min player have optimal policies that attain the LFP q* . 
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by the right-hand side of the rule. The process is repeated as long as the current population is 
nonempty, and terminates if and when Xk becomes empty. The objective of the controller is either 
to minimize the probability of termination (i.e., extinction of the population), in which case the 
process is a minBMDP, or to maximize the termination probability, in which case it is a maxBMDP. 
At each stage, k, the controller is allowed in principle to select the actions for the entities of 
based on the whole past history, may use randomization (a mixed strategy) and may make different 
choices for entities of the same type. However, it turns out that these flexibilities do not increase the 
controller's power, and there is always an optimal pure, memoryless strategy that always chooses 
the same action for all entities of the same type (|14j). 

For each type T, of a minBMDP (respectively, maxBMDP), let q* be the minimum (resp. 
maximum) probability of termination if the initial population consists of a single entity of type 
Tj. From the given minBMDP (maxBMDP) we can construct a minPPS (resp. maxPPS) x = 
P{x) whose LFP is precisely the vector q* of optimal termination (extinction) probabilities (see 
Theorem 20 in the full version of |14|): The min/max polynomial Pi(x) for each type Tj contains 
one polynomial qij(x) for each action j £ Ai, with qij(x) = YlreR(Ti j)PrX ar '. 

A 1-exit Recursive Markov Decision Process (1-RMDP) consists of a finite set of compo- 
nents A\, . . . , A}-, where each component Aj is essentially a finite-state MDP augmented with the 
ability to make recursive calls to itself and other components. Formally, each component Ai has a 
finite set iV, of nodes, which are partitioned into probabilistic nodes and controlled nodes, and a 
finite set E>i of "boxes" (or supernodes), where each box is mapped to some component. One node 
erii is specified as the entry of the component Ai and one node ex{ as the exit of A^ The exit 
node has no outgoing edges. All other nodes and the boxes have outgoing edges; the edges out of 
the probabilistic nodes and boxes are labelled with probabilities, where the sum of the probabilities 
out of the same node or box is equal to 1. 

Execution of a 1-RMDP starts at some node, for example, the entry en\ of component A\. 
When the execution is at a probabilistic node v, then an edge out of v is chosen randomly according 
to the probabilities of the edges out of v. At a controlled node v, an edge out of v is chosen by a 
controller who seeks to optimize his objective. When the execution reaches a box b of Ai mapped to 
some component Aj, then the current component is suspended and a recursive call to Aj is initiated 
at its entry node erij; if the call to Aj terminates, i.e. reaches eventually its exit node exj, then 
the execution of component Ai resumes from box b following an edge out of b chosen according to 
the probability distribution of the outgoing edges of b. Note that a call to a component can make 
further recursive calls, thus, at any point there is in general a stack of suspended recursive calls, and 
there can be an arbitrary number of such suspended calls; thus, a 1-RMDP induces generally an 
infinite-state MDP. The process terminates when the execution reaches the exit of the component 
of the initial node and there are no suspended recursive calls. 

There are two types of 1-RMDPs with a termination objective: In a min 1-RMDP (resp. max 
1-RMDP) the objective of the controller is to minimize (resp. maximize) the probability of ter- 
mination. In principle, a controller can use the complete past history of the process and also use 
randomization (i.e. a mixed strategy) to select at each point when the execution reaches a con- 
trolled node which edge to select out of the node. As shown in |14| however, there is always an 
optimal strategy that is pure, stackless and memoryless, i.e., selects deterministically one edge out 

2 The restriction to having only one entry node is not important; any multi-entry RMDP can be efficiently trans- 
formed to an 1-entry RMDP. The restriction to 1-exit is very important: multi-exit RMDPs lead to undecidable 
termination problems, even for any non-trivial approximation of the optimal values |14| . 
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of each controlled node, the same one every time, independent of the stack and of the past history 
(including the starting node). From a given min or max 1-RMDP we can construct efficiently a 
minPPS or maxPPS, whose LFP yields the minimum or maximum termination probabilities for all 
the different possible starting vertices of the 1-RMDP |14| . Conversely, from any given min/max 
PPS, we can efficiently construct a 1-RMDP whose optimal termination probabilities yield the LFP 
of the min/max PPS. The system of equations for a 1-RMDP has a particularly simple form. All 
max/minPPS can be put in that form. 

It is convenient to put max/minPPS in the following simple form. 

Definition 2.6. A maxPPS in simple normal form (SNF) ; x = P{x), is a system of n equations 
in n variables x%, x%, ...x n where each Pi{x) for i = 1, 2, ...n is in one of three forms: 

• Form L; P(x)i = a^o + ^2j=± a i,j x j> where aij > for all j, and such that X]j=o a *.i — 1 

• Form Q: P(x)i = XjX^ for some j, k 

• Form M: P(x)i = max{xj, x^} for some j, k 

We define SNF form for minPPSs analogously: only the definition of "Form M" changes, replacing 
max with min. 

In the setting of a max/minPPS in SNF form, for simplicity in notation, when we talk about 
a policy, if Pi{x) has form M, say Pi{x) = max{xj,x/%}, then when it is clear from the context we 
will use cr(i) = k to mean that the policy a chooses x/t among the two choices Xj and x^ available 
in Pi{x) = max{xj, Xfc}. 

Proposition 2.7 (cf. Proposition 7.3 |15|). Every max/minPPS, x = P{x), can be transformed in 
P-time to an "equivalent" max/minPPS , y = Q{y) in SNF form, such that \Q\ € 0(\P\). More 
precisely, the variables x are a subset of the variables y, the LFP of x = P(x) is the projection of 
the LFP of y = Q(y), and an optimal policy (respectively, e-optimal policy) for x = P(x) can be 
obtained in P-time from an optimal (resp., e-optimal) policy of y = Q(y). 

Proof. We can easily convert, in P-time, any max/minPPS into SNF form, using the following 
procedure. 

• For each equation Xj = Pi(x) = max {pi(x), . . . ,p m (x)}, for each Pj{x) on the right-hand-side 
that is not a variable, add a new variable x/~, replace Pj(x) with xt in Pi(x), and add the new 
equation x^ = Pj(x). Do similarly if Pi{x) = min{pi(x), . . . ,p m (x)}. 

• If Pi(x) = max {xj 1 , ...,Xj m } with m > 2, then add m — 2 new variables x^, ... ,Xj m _ 2 , set 
Pi(x) = max {xj 1 ,Xi 1 }, and add the equations %i x = max {xj 2 ,Xj 2 }, Xj 2 = max {xj 3 ,Xj 3 }, . . ., 
Xj m _ 2 = max {xj m _ x ,Xj m }. Do similarly if Pj(x) = mimfx^, x jm } with m > 2. 

• For each equation Xj = Pi{x) = Y^jLi PjX aj , where Pj(x) is a probabilistic polynomial that is 
not just a constant or a single monomial, replace every monomial x aj on the right-hand-side 
that is not a single variable by a new variable Xi j and add the equation X{. = x aj . 

• For each variable Xj that occurs in some polynomial with exponent higher than 1, introduce 
new variables x^ , . . . , X{ k where k is the logarithm of the highest exponent of Xj that occurs in 
P(x), and add equations Xj x = x?, Xj 2 = sc? , . . ., Xj fc = x? fc . For every occurrence of a higher 
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power x\, I > 1, of Xi in P(x), if the binary representation of the exponent I is . . . a^aido, 
then we replace x\ by the product of the variables Xi j such that the corresponding bit aj is 
1, and Xi if clq = 1. After we perform this replacement for all the higher powers of all the 
variables, every polynomial of total degree >2 is just a product of variables. 

• If a polynomial Pi{x) = Xj 1 ■ ■ ■ Xj m in the current system is the product of m > 2 variables, 
then add m — 2 new variables Xi x , . . . , X{ m _ 2 , set P L (x) = Xj 1 Xi 1 , and add the equations 

ry* . ry* . ry* . ry* . ry* . rf . ry* . ry* . ry* . 

Now all equations are of the form L, Q, or M. 

The above procedure allows us to convert any max/minPPS into one in SNF form by introducing 
0(\P\) new variables and blowing up the size of P by a constant factor O(l). Furthermore, there is 
an obvious (and easy to compute) bijection between policies for the resulting SNF form max/minPPS 
and the original max/minPPS. □ 

Thus from now on, and for the rest of this paper we assume, without loss of generality, that all 
max/minPPSs are in SNF normal form. 

We now summarize some of the main previous results on PPSs and max/minPPSs. 

Proposition 2.8 (|14]). There is a P-time algorithm that, given a minPPS or maxPPS, x = P(x), 
over n variables, with LFP q* 6 K™ , determines for every i = 1, . . . ,n whether q* = or q* = 1 
or < q* < 1. 

Thus, given a max/minPPS we can find in P-time all the variables Xi such that q* = or q* = 1, 
remove them and their corresponding equations Xi = Pi(x), and substitute their values on the RHS 
of the remaining equations. This yields a new max/minPPS, x' = P'(x'), where its LFP solution, 
q'* , is < q 1 * < 1, which corresponds to the remaining coordinates of q* . Thus, it suffices to focus 
our attention to systems whose LFP is strictly between and 1. 

The decision problem of determining whether a coordinate q? of the LFP is > 1/2 (or whether 
q^ > r for any other given bound r £ (0, 1)) is at least as hard as the Square- Root-Sum and the 
PosSLP problems even for PPS (without the min and max operator) |15| and hence it is highly 
unlikely that it can be solved in P. 

The problem of approximating efficiently the LFP of a PPS was solved recently in [llj, by using 
Newton's method after elimination of the variables with value and 1. 

Definition 2.9. For a PPS x = P(x) we use P'(x) to denote the Jacobian matrix of partial 
derivatives of P(x), i.e., P'(x)ij := . For a point x € M. n , if(I — P'(x)) is non-singular, then 

we define one Newton iteration at x via the operator: 

N{x) =x + (I- P'(x))" 1 (P(x) - x) 

Given a max/minPPS, x—P(x), and a policy a, we use M a {x) to denote the Newton operator of the 
PPS x = P a {x); i.e., if (I — P' a (x)) is non-singular at a point x G M. n , then M a {x) = x + (I — 
PUx))-HP^)-x). 

Theorem 2.10 (Theorem 3.2 and Corollary 4.5 of |llj). Let x = P(x) be a PPS with rational 
coefficients in SNF form which has least fixed point < q* < 1. If we conduct iterations of Newton's 
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method as follows: := 0, and for k > 0: := J\f(x^), then the Newton operator Af(x^) 

is defined for all k > 0, and for any j > 0: 

\\q*-x^ + ^\\ 00 <2^ 

where \P\ is the total bit encoding length of the system x = P(x). 

Furthermore, there is an algorithm (based on suitable rounding of Newton iterations) which, 
given a PPS, x = P{x), and given a positive integer j, computes a rational vector v € [0, l] n , such 
that \\q* — u||oo < 2^ an d which runs in time polynomial in \P\ and j in the standard Turing model 
of computation. 

The proof of the theorem involves a number of technical lemmas on PPS and Newton's method, 
several of which we will also need in this paper, some of them in strengthened form. 

Lemma 2.11. (c.fi, Lemma 3.6 of J77]/) Given a PPS, x = P{x), with LFP q* > 0, if < y < q* , 
and if (I — P' (y))~ 1 exists and is non-negative (in which case clearly M {y) is defined), thenM(y) < q* 
holdsE 

Proof. In Lemma 3.4 of [11] it was established that when (I—P'(y)) is non-singular, i.e., (I—P'^y))^ 1 
is defined, and thus M{y) is defined, then 

q* - M(y) = (I- P'( y) )-i p/ ^- P '^ (g* - y) (l) 

Now, since all polynomials in P{x) have non-negative coefficients, it follows that the Jacobian P'{x) 
is monotone in x, and thus since y < q* , we have that P'(q*) > P'(y)- Thus (P'(q*) — P'(y)) > 0, 
and by assumption (q* — y) > 0. Thus, by the assumption that (/ — > 0, we have by 

equation © that q* - M(y) > 0, i.e., that q* > M(y). □ 

We also need the following, which is a less immediate consequence of results in |11| : 

Lemma 2.12. Given a PPS, x = P(x), with LFP q* > 0, if < y < q* , and y < 1, then 
(I — exists and is non-negative. 

The proof of this lemma is more involved and is given in the appendix. To prove the polynomial- 
time upper bounds in [11], an inductive step of the following form was used: 

Lemma 2.13 (Combining Lemma 3.7 and Lemma 3.5 of |11|). Let x = P(x) be a PPS in SNF 

with < q* < 1. For any < x < q* and A > 0, the operator Af(x) is defined, Af(x) < q* , and if 
q* -x < X(l-q*) then q* - M(x) < |(1 - q*). 

If we knew an optimal policy r for a max/minPPS, x = P(x), then we would be able to solve the 
problem of computing the LFP for a max/minPPS using the algorithm in |11| for approximating 
q*, because we know q* = q*. Unfortunately, we do not know which policy is optimal. There are 
exponentially many policies, so it would be inefficient to run this algorithm using every policy. (And 
even if we did do so for each possible policy, we would only be able to e-approximate the values g* 
for each policy a using the results of |llj . for say, e = 2~ 3 for some chosen j, and therefore we could 
only be sure that a particular policy that yields the best result is, say, (2e)-optimal, but it may not 

3 Note that the Lemma does not claim that N(y) > holds. Indeed, it may not. 
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not necessarily be optimal.) In fact, as we will see, it is probably impossible to identify an optimal 
policy in polynomial time. 

Our goal instead will be to find an iteration I{x) for max/minPPS, that has similar properties to 
the Newton operator for PPS, i.e., that can be computed efficiently for a given x and for which we can 
prove a similar property to Lemma l2.13l i.e. such that if q*—x < \(l — q*), then q*—I(x) < ^(1— g*). 
Once we do so, we will be able to adapt and extend results from to get a polynomial time 
algorithm for the problem of approximating the LFP q* of a max/minPPS. 

3 Generalizing Newton's method using linear programming 

If a max/minPPS, x = P(x), has no equations of form Q, then it amounts to precisely the Bell- 
man equations for an ordinary finite-state Markov Decision Process with the objective of maximiz- 
ing/minimizing reachability probabilities. It is well known that we can compute the exact (rational) 
optimal values for such finite-state MDPs, and thus the exact LFP, q* , for such a max(min)-linear 
systems, using linear programming (see, e.g., |20[ 16]). 

Computing the LFP of max/minPPSs is clearly a generalization of this finite-state MDP problem 
to the infinite-state setting of branching and recursive MDPs. If we have no equations of form M, 
we have a PPS, which we can solve in P-time using Newton's method, as shown recently in An 
iteration of Newton's method works by approximating the system of equations by a linear system. 
For a maxPPS(or minPPS), we will define an analogous "approximate" system of equations that 
we have to solve in each iteration of "Generalized Newton's Method" (GNM) which has both 
linear equations and equations involving the max (or min) function. We will show that we can solve 
the equations that arise from each iteration of GNM using linear programming. We will then show 
that a polynomial (in fact, linear) number of iterations are enough to approximate the desired LFP 
solution, and that it suffices to carry out the computations with polynomial precision. 

The rest of this Section is organized as follows. In Section 3.1 we define a linearization of a 
max/minPPS and prove some basic properties. In 3.2 we define the operator for an iteration of 
the Generalized Newton's method and show that it can be computed by Linear Programming. In 
Section 3.3 we analyze the operator for maxPPS and in Section 3.4 for minPPS. Finally in Section 
3.5 we put everything together and show that the algorithm approximates the LFP within any 
desired precision in polynomial time in the Turing model. 

3.1 Linearizations of max/minPPSs and their properties 

We begin by expressing the max/min linear equations that should be solved by one iteration of what 
will eventually become the "Generalized Newton's Method" (GNM), applied at a point y. Recall 
that we assume w.l.o.g. throughout that max/minPPS and PPS are in SNF. 

Definition 3.1. For a max/minPPS, x = P(x), with n variables, the linearization of P(x) at a 
point y G 1", is a system of max/min linear functions denoted by P y (x), which has the following 
form: 

if P(x)i has form L or M, then Pf(x) = Pi(x), and 

if P{x)i has form Q, i.e., P(x)i = XjXk for some j,k, then 

P?(x) = yjx k + xjy k - yjy k 
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We can consider the linearization of a PPS, x = P a (x), obtained as the result of fixing a policy, 
a, for a max/minPPS, x = P{x). 

Definition 3.2. P%{x) := {P a ) y {x). 

Note than the linearization P y {x) only changes equations of form Q, and using a policy a only 
changes equations of form M, so these operations are independent in terms of the effects they have 
on the underlying equations, and thus -Pj'(x) = (P a )' y (x) = (P y ) (T (x). 

Lemma 3.3. Let x = P(x) be any PPS. For any y £ W 1 , let (P y )'(x) denote the Jacobian matrix 
ofP y (x). Then for any x £ W 1 , we have {P y )'{x) = P'(y). 

Proof. We need to show that the Jacobian (P y )'(x) of P y (x), evaluated anywhere, is equal to 
P'(y). If X{ = Pi(x) is not of form Q, then, for any x £ M n , Pi{x) = Pf{x). So for any Xj, 

~8a^ = ^dx^ . Otherwise, x% = Pi(x) has form Q, that is Pi{x) = XjXk for some variables Xj,Xk- 

Then Pf (x) = yjX k + xjy k - yjyk- In this case &P q}^ = yk and 9P q}^ = Vj- But when x = y, 

- =yj. Furthermore, clearly for any x h with I / j and I ^ k, = and 

^5j^ = 0. We have thus established that (P y )'(x) = P'(y) for any x € M. n . □ 

Lemma 3.4. If x = P(x) is any PPS, then for any x,y G IR n , P y (x) = P{y) + P'(y)(x — y). 

Proof. Firstly, note that P y (x) = P y (y) + (P y )' (x)(x — y), since the functions Pf{x) are all linear 
in x. Next, observe that Pi(y) = Pf(y), for all i, and thus that P(y) = P y (y). Thus, to show that 
P y (x) = P y (y)+P'(y)(x — y) = P(y)+P' (y)(x — y), all we need to show is that the Jacobian (P y )'(x) 
of P y (x), evaluated anywhere, is equal to P'{y). But this was established in Lemma [3.31 □ 

An iteration of Newton's method on x = P a (x) at a point y solves a system of linear equations 
that can be expressed in terms of Pa(x). The next lemma establishes this basic fact in part (i). In 
part (it) it provides us with conditions under which we are guaranteed to be doing "at least as well" 
as one such Newton iteration. 

Lemma 3.5. Suppose that the matrix inverse (I — P' a {y))~ l exists and is non-negative, for some 
policy a, and some y € W 1 . Then 

(i) M a {y) is defined, and is equal to the unique point a £ W 1 such that Pa (a) = a. 

(ii) For any vector x £ M n : 

IfP$(x) > x, then x < M a {y). 
IfPa (x) < x, then x > M^y). 

Proof, (i): We define: 

a = y + (i - p^rHPJy) - y) = K(v) 

Then we can re-arrange this expression, reversibly, yielding: 

a = y+{I-P' a (y))-\P a {y)-y) & P a (y) - y - (I - P^(y))(a - y) = 

^ P a {y) + P' a (y)(a - y) = a 
^ P y (a)=a (by Lemma [31} 
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Uniqueness follows from the reversibility of these transformations. 

(ii): Firstly, we shall observe that the result of applying Newton's method to solve x = P&(x) with 
any initial point x gives us N a (y) = a in a single iteration. Recalling from Lemma 13.31 that the 
following equality hold between the Jacobians: (P y )'(x) = PL{y), one iteration of Newton's method 
applied to x = Pa(x) can be equivalently defined as: 

x + (I-P>(y))-\py( x )- x ) = x + {I-P' a {y)r 1 {P a {y)+P' a {y){x- y )-x) 

= (I- P^rHx - P' a {y)x + P a (y) + P' a {y){x - y) - x) 

= {i-p' a (y)r\Pe{y)-P'My) 

= (i - p'M)'\{i - K(v))y + p.M - v) 

= y + {I-P' a (y))-\ P(T {y)-y) 

= K(y). 

We thus have M a (y) = x+(I — P-{y))~ l (P^ {x) — x). By assumption, (I — P' a (y))~ l is a non-negative 
matrix. So if P%(x) — x > then N a {y) > x, whereas if P„(x) — x < then M CT (y) < x. □ 



3.2 The iteration operator of Generalized Newton's Method 

We shall now define distinct iteration operators for a maxPPS and a minPPS, both of which we 
shall refer to with the overloaded notation I(x). (We shall also establish in the next two subsections 
that the operators are well-defined in their respective settings.) These operators will serve as the 
basis for a Generalized Newton's Method to be applied to maxPPSs and minPPSs, respectively. 

Definition 3.6. For a maxPPS, x = P{x), with LFP q* , such that < q* < 1, and for a real vector 
y such that < y < q* , we define the operator I(y) to be the unique optimal solution, a £ M n , to 
the following mathematical program: Minimize: Yli a i i Subject to: P y (a)<a. 

For a minPPS, x = P{x), with LFP q* , such that < q* < 1, and for a real vector y such that 
< y < q* , we define the operator I(y) to be the unique optimal solution a E M n to the following 
mathematical program: Maximize: Oj ; Subject to: P y {a)>a. 

A priori, it is not even clear if the above "definitions" of I{x) for maxPPSs and minPPSs are well- 
defined. We now make the following central claim, which we shall prove separately for maxPPSs 
and minPPSs in the following two subsections: 

Proposition 3.7. Let x = P{x) be a max/minPPS, with LFP q* , such that < q* < 1. For any 

0<x< q*: 

1. I{x) is well-defined, and L(x) < q* , and: 

2. For any A > 7 if q* - x < A(l - q*) then q* - I(x) < |(1 - q*). 

The next proposition observes that linear programming can be used to compute an iteration of 
the operator, L(x), for both maxPPSs and minPPSs. 

Proposition 3.8. Given a max/minPPS, x = P(x), with LFP q* , and given a rational vector y, 
< y < q* , the constrained optimization problem (i.e., mathematical program) "defining" L(y) can 
be described by a LP whose encoding size is polynomial (in fact, linear) in both \P\ and the encoding 
size of the rational vector y. Thus, we can compute the (unique) optimal solution L(y) to such an 
LP (assuming it exists, and is unique) in P-time. 
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Proof. For a maxPPS (minPPS), the definition of I(x) asks us to maximize (minimize) a linear 
objective, ^i a «> subject to the constraints P y (a) < a (P y (a) > a, respectively). All of these 
constraints are linear, except the constraints of form M. For a maxPPS, if (P y (a))i is of form M, 
then the corresponding constraint is an inequality of the form max {a,j,dk} < aj. Such an inequality 
is equivalent to, and can be replaced by, the two linear inequalities: dj < and < dj. Likewise, 
for a minPPS, if (P v (a))i is of form M, then the corresponding constraint is an inequality of the 
form min {a^a^} > a^. Again, such an inequality is equivalent to, and can be replaced by, two 
linear inequalities: a,j > Oj and a& > a^. 

Thus, for a rational vector y whose encoding length is size(y), the operator I(y) can be formu- 
lated (for both maxPPSs and minPPSs) as a problem of computing the unique optimal solution to 
a linear program whose encoding size is polynomial (in fact, linear) in \P\ and in size(y). □ 

3.3 An iteration of Generalized Newton's Method (GNM) for maxPPSs 

For a maxPPS, x = P(x), we know by Theorem 12,51 that there exists an optimal policy, r, such that 
q* = q* > q* for any policy a. The next lemma implies part (i) of Proposition 13.71 for maxPPS: 

Lemma 3.9. If x = P(x) is a maxPPS, with LFP solution < q* < 1, and y is a real vector 
with < y < q* , then x = P y (x) has a least fixed point solution, denoted [iP y , with fiP y < q* . 
Furthermore, the operator I{y) is well-defined, I(y) = fiP y < q* , and for any optimal policy t, 



Proof. Recall that (by Proposition I3.8|) the following can be written as an LP that 'defines" I(y): 



Firstly, we show that the LP constraints P y (a) < a in the definition of I(y) are feasible. We 
do so by showing that actually P y (q*) < q* . At any coordinate i, if Pi{x) has form M or L, then 
Pii.1*) = Pi(Q*) = Qi - Otherwise, Pi(x) has form Q, i.e., Pi(x) = XjX^, and then 



Next we show that the LP ([2]) defining I{y) is bounded. Recall that, by Theorem 12.51 there is 
always an optimal policy for any maxPPS, x = P(x). 

Claim 3.10. Let x = P(x) be any maxPPS, with < q* < 1, and let r be any optimal policy for 
x = P(x). For any y such that < y < q* , we have that Af T (y) is defined, and for any vector a, if 
P y (a) < a then M T {y) < a. In particular, J\f T (y) < q* . 

Proof. Recall, from our definition of an optimal policy, that q* = q* is also the least non-negative 
solution to x = P T (x). So we can apply Lemma 12.121 using x = P T {x) and y < q* to deduce that 
(/ — P^(y)) -1 exist and is non-negative. Thus N T (y) is defined. Now, by applying Lemma 13.51 (ii). 
to show that a > Af T (y) all we need to show is that P y (a) < a. But recalling that x = P(x) is a 
maxPPS, by the definition of P y (x) and P y (x), we have that P y (a) < P y (a) < a. We have just 
shown before this Claim that P y {q*) < q* , and thus Af T (y) < q*. □ 



I(y) = ^P y >M T {y). 




Subject to: P y (a) < a 



(2) 



P?{d*) 



QjVk + yjq* k ~ VjVk 

QjQk ~ (ij -Vj){<&-Vk) 
q* (since y < q*) 
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Thus the LP ([2]) defining I(y) is both feasible and bounded, hence it has an optimal solution. 
To show that I(y) is well-defined, all that remains is to show that this optimal solution is unique. In 
the process, we will also show that I{y) defines precisely the least fixed point solution of x = P y (x), 
which we denote by [iP y . 

Firstly, we claim that for any optimal solution b to the LP ([2]), it must be the case that P y (b) = b. 
Suppose not. Then there exists i such that P y (b)i < bi, then we can define a new vector b' , such 
that b[ = P y (b)i and bj = bj for all j ^ i. By monotonicity of P y (x), it is clear that P y (b') < b' , and 
thus that b' is a feasible solution to the LP ([2|). But Yli^'i < Yli^ii contradicting the assumption 
that b is an optimal solution to the LP ([2]). 

Secondly, we claim that there is a unique optimal solution. Suppose not: suppose b and c are 
two distinct optimal solution to the LP J2]). Define a new vector d by di = min{6j,Cj}, for all i. 
Clearly, d < b and d < c. Thus by the monotonicity of P y (x), for all i P y (d)i < P y (b)i = bi, and 
likewise P y (d)i < P y {c)i = c^. Thus P y (d) < d, and d is a feasible solution to the LP. But since b 
and c are distinct, and yet Yli^i = Si c «' we nave that Yli^-i < Si &t = Si c «) contradicting the 
optimality of both b and c. 

We have thus established that I(y) defines the unique least fixed point solution of x = P y (x), 
which we denote also by [iP y . Since q* is also a solution of the LP, we have \xP y < q*. 

Finally, by Claim [3,101 it must be the case that I(y) = fj,P y > Af T (y), where r is any optimal 
policy for x = P{x). □ 

We next establish part (ii) of Proposition 13.71 for maxPPS. 

Lemma 3.11. Let x = P(x) be a maxPPS with < q* < 1. For any < x < q* and A > 0, we 
have I(x) < q* , and furthermore if: 

q* -x < X(l-q*) 

then 

q*-I(x)<±(l-q*) 

Proof Let r be an optimal policy (which exists by Theorem 12. 5p . The least fixed point solution of 
the PPS x = P T {x) is q*. From our assumptions, Lemma 12.131 gives that q* -M T {x) < |(1 - q*). 
But by Lemma 13.91 J\f r {x) < I(x) < q*. The claim follows. □ 

Proposition 13.71 for maxPPSs follows from Lemmas 13.91 and 13.111 In subsection 13.51 we will 
combine this result with methods from [11] to obtain a P-time algorithm for approximating the 
LFP of a maxPPS, in the standard Turing model of computation. 

3.4 An iteration of GNM for minPPSs 

Our proof of the minPPS version of Lemma 13.111 will be somewhat different, because it turns out 
we can not use the same argument based on LPs to prove that I(y) is well-defined. Fortunately, 
in the case of minPPSs, we can show that (/ — P a {y))~ l exists and is non- negative for any policies 
a, at those points y that are of interest. And we can use this to show that there is some policy, a, 
such that I(y) is equivalent to an iteration of Newton's method at y after fixing the policy a. We 
shall establish the existence of such a policy using a policy improvement argument, instead of just 
using the LP, as we did for maxPPSs. (Note that the policy improvement algorithm may not be an 
efficient (P-time) way to compute it, and we do not claim it is. We only use policy improvement as 
an argument in the proof of existence of a suitable policy a.) 
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Lemma 3.12. For a minPPS, x = P(x), and for any policy a, the LFP of, x = P a {x), denoted q*, 
satisfies q* < q*. 

Proof By Theorem 12.51 there is an optimal policy r with q* = q* . But we defined an optimal policy 
for a minPPS as one with q* < q* for any policies v. So q* = q* < q*. □ 

Lemma [3.121 allows us to use Lemma 12. 121 with any policy, not just with optimal policies: 

Lemma 3.13. For a minPPS, x = P(x), with LFP < q* < 1, for any < y < q* and any policy 
a, (I — P a (y))~ 1 exists and is non-negative. Thus also M a (y) is defined. 

Proof. < y < q* < q* < 1. Note also that y < 1, and that q* > q* > 0. This is all we need for 
Lemma [2. 121 to apply. □ 

Lemma 3.14. Given a minPPS, x = P{x), with LFP < q* < 1, and a vector y with < y < q* , 
there is a policy a such that P y (N a {y)) = M a {y). 

Proof. We use a policy (strategy) improvement "algorithm" to prove this. Start with any policy o\ . 
At step i, suppose we have a policy Uj. 

For notational simplicity, in the following we use the abbreviation: z = M ai {y). By Lemma [3. 5 1 
Pa t (z) = z. So we have P y {z) < z. If P y (z) = z, then stop: we are done. 

Otherwise, to construct the next strategy o"j+i, take the smallest j such that (P y (z))j < Zj. 
Note that Pj(x) has form M, because otherwise (P(x))j = {P ai (x))j. Thus, there is some variable 
Xk with Pj(x) = min {x^, x ai(j)} an d zj~ < z ai(j)- Define Uj+i to be: 



o- l+ \{l) 



k if I = j 



Then {Pa i+1 {z))j < Zj, but for every other coordinate I ^ j, (P<j i+1 (z))i = (P^(-z))i = Z\. Thus 

Pl + M)< z ( 3 ) 

By Lemma 13.131 M ai+1 {y) is defined. Moreover, the inequality 0, together with Lemma 13.51 (ii). 
yields that A/" CTi+1 (y) < z. But M ai+1 (y) ^ z because P& i+1 (z) ^ z whereas, by Lemma 13.51 (i), we 

lmvePl +1 (K i+1 (y))=K l+ M- 1 ' 

Thus this algorithm gives us a sequence of policies o~\, 02- ■■ with M ai (y) >M CT2 {y) > M a3 (y) > 
where furthermore each step must strictly decrease at least one coordinate oiJ\f ai (y). It follows that 
o~i 7^ CTj, unless i = j. There are only finitely many policies. So the sequence must be finite, and the 
algorithm terminates. But it only terminates when we reach a <jj with P y {M ai {y)) = M ai (y)- Q 

We note that the analogous policy improvement algorithm might fail to work for maxPPSs, as we 
might reach a policy dj where (/ — Pq-^x)) -1 does not exist, or has a negative entry. 

The next Lemma shows that this policy improvement algorithm always produces a coordinate- 
wise minimal Newton iterate over all policies. 

Lemma 3.15. For a minPPS, x = P(x), with LFP < q* < I, if < y < q* and a is a policy 
such that P y {M (7 (y)) =N CT {y), then: 

(i) For any policy a' , J\f a '(y) > N a (y). 
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(ii) For any x € R n with P y {x) > x, we have x < M a {y). 
(Hi) For any x € W 1 with P y (x) < x, we have x > M a {y). 
(iv) M a {y) is the unique fixed point of x = P y {x). 

(v)K(y)<g*. 

Proof. Note firstly that by Lemma [3. 131 for any policy a, (I — i-^(y)) -1 exists and is non-negative, 
and M a {y) is defined. 

(i) Consider P%,(Af a (y))- Note that pv,(N a {y)) > P y (Af a (y)) = J\f a (y) by assumption. Thus, by 
Lemma [33] (ii), M a (y) < K'{y)- 

(ii) Pa{x) > P y {x) > x, so by Lemma 13.51 (ii). x < Af a (y)- 

(hi) If P y {x) < x, then there a policy a' with P^,(x) < x, and by Lemma [3.51 (ii). x > N a '{y). So 
using part (i) of this Lemma, x > A/" CT '(y) > M a {y). 

(iv) By assumption, N a {y) is a fixed point of x = P y {x). We just need uniqueness. If P y (q) = q, 
then by parts (ii) and (hi) of this Lemma, q < M a {y) and q > J\f a (y), i.e., q = M a {y). 

(v) Consider an optimal policy r, for the minPPS, x = P{x). From Lemma |2.11[ if follows that 
A/" r (y) < q* = q* . And then part (i) of this Lemma, gives us that N CT {y) < A/" T (y) < <?*• 

□ 

We can now return to using linear programming, which we can do in polynomial time. Recall the 
LP that "defines" I(y), for a minPPS: 

Maximize: ; Subject to: P y (a) > a (4) 

i 

Lemma 3.16. For a minPPS, x = P{x), with LFP < q* < 1, and for < y < q* , there is a 
unique optimal solution, which we call I(y), to the LP and furthermore I(y) = M a {y) for some 
policy a, and P y (I(y)) = I(y)- 

Proof By Lemma l3.14( there is a a such that P y (J\f a (y)) = M a {y). So N a {y) is a feasible solution 
of P y (a) > a. Let a by any solution of P y {a) > a. By Lemma 13.151 (ii), a < N CT {y). Consequently 
Y17=i a i — Sr=i(-^o-(y))« with equality only if a = N a {y). So M a (y) is the unique optimal solution 
of the LP flU). □ 

In the maxPPS case, we had an iteration that was at least as good as iterating with the optimal 
policy. Here we have an iteration that is at least as bad! Nevertheless, we shall see that it is good 
enough. In the maxPPS case, the analog of Lemma I2,13| Lemma 13. 11\ thus followed from Lemma 
12.131 Here we crucially need a stronger result than Lemma [2.131 

Lemma 3.17. If x = P{x) is a PPS and we are given x,y £ M n with < x < y < P{y) < 1, and 
if the following conditions hold: 

A > and y — x < X(l — y) and (I — P'(x))^ 1 exists and is non-negative, (5) 

then y - Af(x) < |(1 - y). 
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(Note that we cannot conclude that y — J\f(x) > 0.) 

Proof. Firstly, we show that P'(y)(l - y) < (1 - y). Clearly, for any PPS, P(l) < 1. Note that 
since by assumption y < P(y), we have (1 — y) > (1 — P(y)) > (P(l) — P(y))- Then by Lemma 
3.3 of p]: 

(1 - y) > P(l) - P(y) = P'(l±^)(l- y ) (6) 

> ^(vXi-v) (7) 

Again by Lemma 3.3 of |llj : P(y) — P(x) = ^(P'(x) + P'(y)){y — x), and thus: 

P{x) = P(y) - \{P'{x) + P'(y)){y - x) (8) 

Thus: 

y-M{x) = y - x - {I - P'{x))- l {P{x) - x) 

= y _ x _(7_p'( x ))-l(p( y )_ x _i(p'( x )+p'( y ))( y _ x )) (by (HD) 

< y — x — (I - P'(x))~ 1 (y -x- ^(P'(x) + P'{y))(y - x)) 

= (y- x )-(I- P'(x))-\(y -x)- ~(P'(x) + P'(y))(y - x)) 

= (/_(/_ P'( X ))-\I - I(P'(x) + P'{y))))(y - x) 

= ((/ - P'{x))-\I - P'{x)) -{I- P'{x))-\I - \{P'{x) + P'{y))))(y - x) 

= (I- P\x))-\I - P'(x) -(I- \{P'{x) + P'(y))))(y - x) 

= (I- P\x))-\-P'{x) + \(P'(x) + P'(y)))(y - x) 

= (/-P'(x))- 1 i(P'(y)-P'(x))(y-x) 

< ^(1 - P'ix^iP'iy) - P'(x))(l - y) (by ©, and because (P'(y) - P'(x)) > 0) 

< ^(/-P'(x))- 1 (/-P'(x))(l-y) (because by ©, P'(y)(l - y) < (1 - y)) 
A, 

□ 

Lemma 3.18. Let x = P{x) be a minPPS, with LFP < q* < 1. For any < x < q* and A > 7 
I{x) < Q*, and j/; 

q* -x < \(l-q*) 

then 

f - I(x) < ±(1 - f) 
18 



Proof. By Lemma |3,14[ there is a policy a with I(x) = J\f a (x). We then apply Lemma 13.171 to 
x = P a (x), x, and q* instead of y. Observe that P a {q*) > P{q*) = q* and that (/ — P^.(x)) _1 
exists and is non-negative. Thus the conditions of Lemma 13.171 hold, and we can conclude that 
q*-Af a (x) < |(1 -q*). Lastly, Lemma EE] (v) and Lemma EH] yield that I(x) = N a {x) < q*. □ 

Proposition 13.71 for minPPS follows from Lemmas 13.161 and 13.181 

3.5 A polynomial-time algorithm (in the Turing model) for max/minPPSs 

In |llj we gave a polynomial time algorithm, in the standard Turing model of computation, for 
approximating the LFP of a PPS, x = P(x), using Newton's method. Here we use the same 
methods from |11] , with our new Generalized Newton's Method (GNM), I(x), to obtain polynomial- 
time algorithms (again, in the standard Turing model), for approximating the LFP of maxPPSs 
and minPPSs. The proof in [11] uses induction based on the "halving lemma", Lemma [2.131 We of 
course now have suitable "halving lemmas" for maxPPSs and minPPSs, namely, Lemmas 13.111 and 
13.181 In [11] . the following bound was used for the base case of the induction: 

Lemma 3.19 (Theorem 3.12 from [llj). If < q* < 1 is the LFP of a PPS, x = P(x), m n 
variables, then for all i € {1, . . . , n}: 

l-g*>2" 4 l p l 

In other words, < q* < 1 — 2 _4 '~ p ' , for all i € {1, . . . , n}. 

We can now easily derive an analogous Lemma for the setting of max/minPPSs: 

Lemma 3.20. If < q* < 1 is the LFP of a max/minPPS, x = P{x), in n variables, then for all 
i €{!,.. .,n}: 

l-q*>2- 4 \ p \ 
In other words, < q* < 1 — 2 _4 '~ p ', for all i € {1, . . . , n}. 

Proof Let r be any optimal policy for x = P{x). We know it exists, by Theorem 12.51 Lemma [3. 191 
gives that 1 -q* > 2" 4 l p -l. All we need is to note is that |P| > \P T \, which clearly holds using any 
sensible encoding for P and P T , in the sense that we should need no more bits needed to encode 
Xi = Xj than to encode Xi = max{xj,x/%} or x^ = mm{xj,Xk}- □ 

Now we can give a polynomial time algorithm, in the Turing model of computation, for approx- 
imating the LFP, q*, for a max/minPPS, to within any desired precision, by carrying out iterations 
of GNM using the same rounding technique, with the same rounding parameter, and using the 
same number of iterations, as in [11] . Specifically, we use the following algorithm with rounding 
parameter h: 

Start with x<® := 0; 

For each k > compute x*- fc+1 ) from x^ as follows: 

1. Calculate 7(a;W) by solving the following LP: 
Minimize: Jj - X{ ; Subject to: P x (x) < x, if x 
or: 

Maximize: Xi ; Subject to: P x (x) > x, if x 



= P(x) is a maxPPS, 
= P(x) is a minPPS. 
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2. For each coordinate i = 1, 2, ...n, set x\ to be the maximum (non-negative) multiple of 
2~ h which is < max{0, I(x^ k ')i}. (In other words, we round /(x( fc )) down to the nearest 2~ h 
and ensure it is non- negative.) 

Theorem 3.21. Given any max/minPPS, x = P{x), with LFP < q* < 1, if we use the above 
algorithm with rounding parameter h = j + 2 + 4|P| ; then the iterations are all defined, and for 
every k > we have < x^ < q* , and furthermore after h = j + 2 + 4|P| iterations we have: 

The proof is very similar to the proof of Theorem 4.2 in and is given in the Appendix. 

Corollary 3.22. Given any max/minPPS, x = P{x), with LFP q* , and given any integer j > 0, 
there is an algorithm that computes a rational vector v with \\q* — f ||oo < 2 _:, 7 in time polynomial 
in \P\ and j. 

Proof First, we use the algorithms given in |14| (Theorems 11 and 13), to detect those variables Xi 
with q* = or q* = 1 in time polynomial in \P\. Then we can remove these from the max/minPPS by 
substituting their known values into the equations for other variables. This gives us a max/minPPS 
with LFP < q'* < 1 and does not increase \P\. Now we can use the iterated GNM, with rounding 
down, as outlined earlier in this section. In each iteration of GNM we solve an LP. Each LP has at 
most n < \P\ variables, at most 2n equations and the numerators and denominators of each rational 
coefficient are no larger than 2 J+2+4 ' p ', so it can be solved in time polynomial in \P\ and j using 
standard algorithms. We need only j +2 +4|P| iterations involving one LP each. Putting back the 
removed and 1 values into the resulting vector gives us the full result q* . This can all be done in 
polynomial time. □ 



4 Computing an e-optimal policy in P-time 

First let us note that we can not hope to compute an optimal policy in P-time, without a major 
breakthrough: 

Theorem 4.1. Computing an optimal policy for a max/minPPS is PosSLP-hard. 

Proof. Recall from \15\ [TT ] that the termination probability vector q* of a SCFG ( equivalent ly, of a 
1-exit RMC) can be equivalently viewed as the LFP of a purely probabilistic PPS, and vice- versa. 

It was shown in |15| (Theorems 5.1 and 5.3), that given a PPS (equivalently, a SCFG or 1-RMC), 
and given a rational probability p, it is PosSLP-hard to decide whether the LFP q\ > p, for a given 
rational p, as well as to decide whether q* < p. (In fact, these hardness results hold already even if 
p = l/2.) 

The fact that computing an optimal policy for max/minPPS is PosSLP-hard follows easily from 
this: For the case of maxPPSs (minPPS, respectively), given a PPS, x=P(x), and given p, we simply 
add a new variable xq to the PPS, and a corresponding equation: 

xq = max{p, x\} (= mm{p, x{\) (9) 

It is clear that q* > p (q* < p, respectively) for the original PPS, if and only if in any optimal 
policy a, for the augmented maxPPS (minPPS, respectively), the policy picks x\ rather than p on 
the RHS of equation [9j So, if we could compute an optimal policy for a maxPPS (minPPS), we 
would be able to decide whether q* > p (whether q* < p, respectively). □ 
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Since we can not hope to compute an optimal policy for max/minPPSs in P-time without a 
major breakthrough, we will instead seek to find a policy a such that ||g* — q*\\oo < e for a given 
desired e > 0, in time poly(\P\, log(l/e)). We have an algorithm for approximating q* . Can we 
use a sufficiently close approximation, q, to q* to find such an e-optimal strategy? Once we have 
an approximation q, it seems natural to consider policies a such that P a {q) = P(q). For minPPSs, 
this means choosing the variable that has the lowest approximate value q^ and for maxPPS choosing 
the variable that has the highest approximate value. It turns out that this works as long as we 
can establish good enough upper bounds on the norm of (/ — for certain values of x. 

Recall that for a square matrix A, p(A) denotes its spectral radius. For a vector x, the norm is 
Halloo := maxj |xj|, and its associated matrix norm ||^4||oo is the maximum absolute- value row sum 
of A, i.e., ||^4||oo := max; £V \A iyj \. 

Theorem 4.2. For a max/minPPS, x = P{x), given < q < q* , such that q < 1, and a policy 
a such that P(q) = P a (q), and such that p(P'J(\{q* + q*))) < 1, and thus (I - P' a {\{q* + q*)))" 1 
exists and is non-negative, then 

\K -q*\\oo< (2||(J - PU\(q*a + ^nioo + " dHoo 

Proof. We know that q is close to q*. We just have to show that q is close to q* as well. We have 
to exploit some results about PPSs established in 

Lemma 4.3. If x = P(x) is a PPS, with LFP q* , such that < q* < 1, and < y < q* , such that 
y < 1, then: 

g* -y = (I - P\l(q* +y)))-\p(y) -y) 

Proof. Lemma 3.3 of [11] tells us that for any PPS, x = P(x), (assumed to be in SNF form), and 
any pair of vectors a, b G M n , we have P(a) — P(b) = P'((a + b)/2)(a — b). Applying this to a = q* 
and b = y, we have that 

q*-P(y)=P'((l/2)(q*+y))(q*-y) 
Subtracting both sides from q* — y, we have that: 

P(y) -y = (I- P'((l/2)(q* + y))){q* - y) (10) 

Now, by Lemma 12.121 we know that for any z < q*, such that z < 1, (/ — P'(z)) -1 exists and is 
non-negative. But since y < q* , clearly also (l/2)(q* + y) < q*, and since y < 1, and q* < 1, then 
clearly (l/2)(q* + y) < 1. Thus (/ — P'((l/2)(q* + y)) _1 exists and is non-negative. Multiplying 
both sides of equation (pH by (/ - P'((l/2)(q* + y))' 1 , we obtain: 

q* - y = (I - P'(l/2(q* + y))" 1 (P(y) - y) 

as required. □ 

By assumption, a was chosen such that P(q) = Pa{q)- Note also that since < q < q* , we have 
< P' a {\{q + q*)) <P' a {\{q* + q*), and thus < p(P^{\{q + q*))) < p{P'Mq* + <£)) < 1- Thus 
(/ — (P^(^(q + (7*))) _1 also exists and is non-negative. Using this, and applying Lemma 14.31 to the 
PPS x = P,j(x), where we set y := q, and taking norms, we obtain the following inequality: 

Ik; - ?||oo < W - p'a\{€ + qW l \U\P{q) - q\U (ii) 

To find a bound on ||-P((7) — q\\oo, we need the following: 
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Lemma 4.4. If x = P(x) is a max/minPPS, and if < y < q* , then \\P(y) — y\\oo — 2||g* — y||oo- 

Proof. Suppose that x = P(x) is a PPS. By Lemma 3.3 of we have that q* — P(y) = P'(^(y + 
Q*))(q* - V)- Since \(y + q*) < 1, ||P'(|(y + <f ))||oo < 2: If the ith row has ^ = P(x) of type L 
then | Pi|i | < 1 and if Xi = P t (x) has type Q, then ££ =1 + ?*))| = + g*) + 

^(yfc + ?*) < 2. So we have that ||g* - P(y)||oo < \\P'{\{y + ?*))IUI?* ~ y\\oo < 2||g* - y]]^. 
As well as y < q* , we know that P(y) < q* since P(x) is monotone. If (P(y))i < yi, then 

Vi-P(v)i < q*-P{y)i < h*-P{y)\\oo < 2||g*-y||oc If Pi (y) > y l , Pi(v)-vi < q*-yi < ||?*-y||oo- 

So \\P(y) - y\\oo < 2||g* - y\\oo as required. 

If x = P(x) is a max/minPPS, then it has some optimal policy, r, and from the above, ||P r (y) — 
2/||oo — 2||g* — y\\oo- It thus only remains to show that \P%{y) — y%\ < 2||g* — y\\oo when Xi = Pi(x) 
is of form M (because the other equations don't change in x = P T (x)). 

If Pi(y) > yi, then this is follows easily: as before we have that Pi(y) — yi < q* —y% < \\q* — y\\oo- 
Suppose that instead we have Pi(y) < yi- Then we consider the two cases (min and max) separately: 

Suppose x = P(x) is a minPPS, and that Pi(x) = min {xj,x k }- Since q* = P(q*), we have: 

< y % - Pi(y) < q* - Pi{y) = min{g*, q* k } - P(y) (12) 

We can assume, w.l.o.g., that Pi(y) = min{yj,yfc} = yj. (The case where Pi{y) = yk is entirely 
analogous.) Then, by (I12p . we have: 

< yi - P{y)i < min{g*, g^} - yj < q* - yj < \\q* - y\\oo 

Suppose now that x = P(x) is a maxPPS, and that Pi(x) = max {xj, Xk}- Again, we are already 
assuming that Pi{y) < y%. Since q* = P(q*), we have: 

< Vi - Pi{y) < q* - Pi{y) = P(g*) - max{ % -, y k } (13) 

We can assume, w.l.o.g., that Pi(q*) = m&x{q*,q k } = q*-. (Again, the case when Pi(q*) = q k is 
entirely analogous.) Then, by (fT3|) . we have: 

< yi - Pi(y) < q*j - max{y j ,y A .} < q* - yj < \\q* - y^ 
This completes the proof of the Lemma for all max/minPPSs. □ 

Now, we can show the result: 
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The last inequality follows because q < q*, and 

oo oo 

o < (i - p'M + v))- 1 = £(^(<£ + ?))' ^ E( p <^: + q*)T = (i - p'MI + g*))" 1 . 

j=0 4 = 

□ 

Finding these bounds is different for maxPPSs and minPPSs . Although we assume that < 
q* < 1, for an arbitrary policy a, it need not be true that < q* < 1. But the following obviously 
does hold: 

Proposition 4.5. Given a max/minPPS, x = P{x), with LFP q* such that < q* < 1, for any 
policy a: 

(i) If x = P(x) is a maxPPS then q* < 1. 

(ii) If x = P(x) is a minPPS, then q* > 0. 

Proof. This is trivial: if x = P(x) is a maxPPS, then clearly q* < q* < 1, because a can be no 
better than an optimal strategy. Likewise, if x = P(x) is a minPPS, then < q* < q* a , for the same 
reason. □ 

For maxPPSs, we may have that some coordinate of q* is equal to and for minPPSs we may 
have that some coordinate of q* is equal to 1, even when < q* < 1. This is the source of the 
different complications. We prove the following result in the appendix: 

Theorem 4.6. If x = P(x) is a PPS with LFP q* > then 

(i) If q* < 1 and < y < 1, then (I — P'(^(y + q*)))^ 1 exists and is non-negative, and 

||(/ - P'{\(y + q*)))- l \\oc < 2 w \ p \max {2(1 - y)^ m , 2^} 

(ii) If q* = 1 and x = P(x) is strongly connected (i.e. every variable depends directly or indirectly 
on every other) and < y < 1 = q* , then {I — P'(y)) -1 exists and is non-negative, and 

\\(l-P'(y)r 1 \\oo<2^ h _] A 

y-l i/Jmin 

We first focus on minPPSs, for which we shall show that if y is a close approximation to q* , 
then any policy a with P{y) = Pa(y) is e-optimal. The maxPPS case will not be so simple: the 
analogous statement is false for maxPPSs. 

Theorem 4.7. If x = P{x) is a minPPS, with LFP < q* < I, and < e < 1, and < y < q* , 
such that \\q* — y\\oo < 2 -14 l f> l~ 3 e, then for any policy a with P a (y) = P(y), \\q* — QaWoo < £■ 

Proof. By Proposition 14.51 q% > q* , and so q* > 0. Suppose for now that q* < 1 (we will show this 
later). Then applying Theorem 14.61 (i). for the case where we set y := q* and the PPS is x = P a (x), 
yields that 

||(J - P' a {\{q* + Or 1 !!™ < 2 10 l^lmax { (i _ 2 - ,2^1} 



; mm 
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Note that \P a \ < \P\. Since for any minPPS, x = P(x), there is an optimal strategy r, and 
x = P T (x) is a PPS with the same LFP, q* = q*, as x = P(x), and furthermore since |P r | < \P\, it 
follows from Theorem 3.12 of [llj that (1 - q*) min > 2~ 4 l p l Thus 

ii(/-p^(i(g*+ 9 ;))r 1 iioo<2 i4 i p i+ i 

Theorem 14.21 now gives that 

lk*-g;iloo<(2 14|p|+2 + l)||g*-y||oo<e 

Thus, under the assumption that q* < 1, we are done. 

To complete the proof, we now show that q* < 1. Suppose, for a contradiction, that for some i, 
(q*)i = 1. Then by results in |15] , x = P a (x) has a bottom strongly connected component S with 
q s = 1. If Xi is in S then only variables in S appear in (P CT )j(x), so we write xs = Ps( x ) f° r the 
PPS which is formed by such equations. We also have that Pe(l) is irreducible and that the least 
fixed point solution of xs = Ps( x s) is 95 = 1- Take ys to be the subvector of y with coordinates in 
S. Now if we apply Theorem 14.61 (ii), by taking the y in its statement to be \{ys + 1), it gives that 



(i-P's(l(ys + i))r 1 \\oo<2^\ T -± 

2 id - V 



ii 1 - vs)r 

But \P S \ < \P\ and (1 - ys) min > (1 - g*)min > 2~ 4 I P I. Thus 

ll(/-^(^(ys + i))r 1 lloo<2 8 i p i+ 1 



Lemma 14.31 gives that 
Taking norms and re- arranging gives: 



1 - ys = (I - P' s {\(.i + ys)))~ l {Ps{ys) - ys) 



111 _ „, II 0-4IPI 

l]Ps{ys> ~ ysn * - ||(/ -JStifo* + i))>- II- " ~ 

However \\Ps(ys) ~ Vs)\\oo < \\P*(y) ~ y\\oo and P ff (y) = P(y). We deduce that \\P(y) - y\\oo > 
2 -i2|F|-i_ Lemma H31 states that \\P(y)-y\\ 00 <2\\q* — y\\ O0 . We thus have ||g*-y||oo > 2 _12 l p l -2 . 
This contradicts our assumption that \\q* — y\\oo 

< 2- 14 l p l" 3 e for some e < 1. □ 

Now we proceed to the harder case of maxPPSs. The main theorem in this case is the following. 

Theorem 4.8. If x = P(x) is a maxPPS with < q* < 1 and given < e < 1 and a vector y, with 
< y < q* , such that \\q* —y\\oo < 2~ 14 l p l~ 2 e ; there exists a policy a such that \\q* — <j£lloo ^ e ? an d 
furthermore, such a policy can be computed in P-time, given x = P(x) and y. 

We need a policy a for which we can apply Theorem 14.61 an d for which we can get good bounds 
on ||P CT (y) — y||oo- Firstly we show that such policies exist. In fact, any optimal policy will do: for an 
optimal policy r, q* > and Lemma [4.41 applied to x = P T (x) gives that ||P T (y) — y\\oo < 2 _14 ' p ' _1 e. 
Unfortunately the optimal policy might be hard to find (Theorem 14. ip . We can however, given a 
policy a and the PPS x = P a {x), easily detect in polynomial time whether q* > (see, e.g., 
Theorem 2.2 of |15j . and also [2]). We shall also make use of the following easy fact: 
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Lemma 4.9. If x = P(x) is a PPS with n variables, and with LFP q* , then for any variable index 
i 6 {1, . . . , n} the following are equivalent 

(i) Qt > 0. 

(ii) there is a k > such that (P fc (0)); > 0. 

(iii) {P n (0))i > 0. 

Proof, (i) => (ii): From [15], P fc (0) -> g* as fc -)■ oo. It follows that if (P fc (0))i = for all jfe, then 
<7* = o. 

(ii) (iii) : Firstly, if there is a 1 < A; < n with (P fe (0))i > then (P n (0))i > 0. P(0) > and 

so by monotonicity and an easy induction P* +1 (0) > P l (0) for all Z > 0. Another induction gives 
that P m (0) > P l (0) when m > I > 0. As k < n, (P n (0)) 4 > (P k (0))i > 0. 

Whether Pj(x) > depends only on whether each xj > or not and not on the value of Xj. 
So, for any k, whether (P fc+1 (0))j > depends only on the set S& = {xj such that (P k (0))j > 0}. 
From before P k+l (0) > P k (0), so Sk+i ^ S^. If ever we have that S^+i = S^, then for any j, 
(P fc+2 (0)), > whenever (P k+1 (0))j > so S k+2 = S k+1 = S k . S k+1 D S k can only occur for n 
values of k as there are only n variables to add. Consequently S n +i = S n and so S m = S n whenever 
m > n. So if we have a k > n with (P k {0))i > 0, then {P n (0))i > 

(iii) (i): By monotonicity and an easy induction, q* > P k (0) for all k > 0. In particular 

q* > P n (0). So q* > (P n (0))i > 0. □ 

Given the maxPPS, x = P(x), with < q* < 1, and given a vector y that satisfies the conditions 
of Theorem 14.81 we shall use the following algorithm to obtain the policy we need: 

1. Initialize the policy a to any policy such that P a {y) = P(y)- 

2. Calculate for which variables Xj in x = P a {x) we have {q%)i = 0. Let Sq denote this set of 
variables. (We can do this in P-time; see e.g., Theorem 2.2 of [15J.) 

3. If for all i we have {q%)i > 0, i.e., if Sq = 0, then terminate and output the policy a. 

4. Otherwise, look for a variable Xi, where Pi(x) is of form M, with Pi(x) = max {xj,x k }, and 
where (q*)i = but one of Xj,x k , say Xj, has (q*)j > and where furthermore — yj\\ < 
2~ 14|P|— ^-^Ve gl ia U establish that such a pair X{ and Xj will always exist when we are at 
this step of the algorithm.) 

Let a 1 be the policy that chooses Xj at Xi but is otherwise identical to a. Set a := a' and 
return to step 2. 

Lemma 4.10. The steps of the above algorithm are always well-defined, and the algorithm always 
terminates with a policy a such that q* > and \\P a (y) — y\\oo < 2~ 14 ' p '~ 1 e. 

Proof. Firstly, to show that the steps of the algorithm are always well-defined, we need to show 
that if there exists an Xi with (<j£)i = 0, then step 4 will find some variable to switch to. Suppose 
there is such an X{. Let r be an optimal policy. (q*)i = q* > 0. So by Lemma |4.9[ (P™)« > 0. 
For any variable Xj with (P T (0))j > 0, the equation Xj = Pj(x) must have form L and not M so 
(P a (0))j > and so {q%)j > 0. There must be a least k, k m { n with 1 < k m [ n < n, such that there is 
a variable Xj with (P k (0))j > but (q* a )j = 0. Let Xj/ be a variable such that (P^ min (0))j/ > but 
(<la)i> -0. 
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Suppose that xy = Pj'(x) has form Q, then Pj'(x) = xjxi for some variables Xj, X[. We have 
< (pfcmh^Q))., = (^^-1(0)^.(^^-1(0))^ So (pfcmin-i^)^. > o and (P r fcm -- 1 (0))i > 0. The 
minimality of k in i n now gives us that (q*)j > and (q*)i > 0. So (g*)i' = (q*)j(q*)i > 0. This is a 
contradiction. Thus, = Pi>(x) does not have form Q. 

Similarly, xy = Pi'(x) does not have form L. So x^ = Pj'(x) has form M. There are variables 
Xj, xi with Pi'{x) = max {xj,x{\. Suppose, w.l.o.g. that (P T (x))i> = Xj. We have P^ min (0))j' > 
and so (P fcmin ~ 1 (0)) : ,- > 0. By minimality of fc m ; n , we have that (<j£)j > 0. We have that (q*)i' = 
and so (P a (x))^ = xi. 

Lemma 23] applied to the system x = P T (x) gives that ||P r (y) — y\\oo < 2 _14 l- p l _1 e. So \y{' — yA = 
\yi> — (P T (y))i'\ < 2~ 14 '~ p '~ 1 e. Thus, step 4 could use i' and change the policy a at i' (i.e., switch 
cr(i')) from x\ to ay. 

Next, we need to show that the algorithm terminates: 

Claim 4.11. If step 4 switches the variable X{ with Pi(x) = max {xj,Xk} from (P CT (x))j = Xk to 
(P a i(x))i = Xj, then 

(i) C > €> 

(ii) (<&)i > 0, 

(Hi) The set of variables x\ with (q*r)i > is a strict superset of the set of variables x\ with {q%)i > 0. 
Proof Recall that step 4 will only switch if {q*)i = and (q*)j > 0. 

(i) We show that, for any t > 0, P*,(0) > P*(0). 

The base case t = 1, is clear, because the only indices i where P«(0) 7^ are when P«(0) has 
form L, in which case P(0) = (P CT '(0))j = (Pr(0))j. 

For the inductive case: note firstly that P a (x) and P a '(x) only differ on the ith. coordi- 
nate. (q*)i = 0, so for any t, (P*(0))j = 0. Suppose that P<,(0) > P*(0). Then by 
monotonicity P^O) > P CT /(P*(0)). But (P CT /(P*(0))) r = (P^ +1 (0)) r when r ^ i. Fur- 
thermore, (P^(P*(0)))i > = (P< +1 (0));. So P o ./(P*(0)) > P^ +1 (0). We thus have that 
p*+i (0 ) > P*+i( ). 

We know that as t -> 00, P*,(0) -> g*, and P<(0) -> g*. So g*, > q*. 

(ii) We have (g*,); = (g*,)^. By (i) (q* a ,)j > (q*)j. We chose Xj such that (g*)j > 0. So (g*,), > 0. 

(iii) If (q* a )i > 0, then by (i) (<£,), > 0. Also (g*); = and by (ii) (g*,)i > 0. 

□ 

Thus, if at some stage of the algorithm we do not yet have q* > 0, then step 4 always gives us a 
new a' with more coordinates having (g*/)i > 0. Furthermore, note that if \\P a (y) — y\\oo < 2 _14 '' P '~ 1 e 
then ||Pr'(y) — 2/II00 < 2~ 14 l p l~ 1 e. Our starting policy has ||P CT (y) -y||oo = \\P(y)-y\\oo < 2~ 14 l p l~ 1 e. 
The algorithm terminates and gives a a with q* > and ||Pj(y) — y\\oo < 2~ 14 ' p '~ 1 e. □ 

We can now complete the proof of the Theorem: 

Proof of Theorem \4-8\ Using the algorithm, we find a a with ||y — P a (y)\\oo < 2 _14 ' p '~ 1 e and g* > 0. 
By Proposition 14. 5[ g* < 1. Applying Theorem 14.61 (i) to the PPS x = P a (x) and point y := q* (not 
to be confused with the y in the statement of Theorem I4.8P , gives that 

||(J-P ( ;(i(g*+g;))r 1 i| oo <2 10 ^lmax{ 2 ,2^} 

* (J- g Jmin 
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We have \P a \ < \P\. Also, from the fact there always exists an optimal policy, and from Theorem 
3.12 of pi], it follows that we have (1 - q*) min > 2" 4 I P I. So 

||(/-P;(i( (? *+g;)))- 1 || 00 <2 14 l p l +1 (14) 

We can not use Theorem 14.21 as stated because we need not have P(y) = P a {y). We do however 
have 

||^(y)-y||oo<2- 14 l p l- 1 e (15) 

Applying Lemma 14.31 an d taking norms, we get the inequality 

Ik* -y\\oo< - P'(\(ql + yW l \\oo\\P{y) - y|U (16) 

Combining flU}, $M and © yields: 

II * II <r 1 

\\Q(7 - y\\oo < 2 e 

so \\q* - < \\q* a - ylU + \\q* - y]]^ < ±e + 2" 14 l p l- 2 e < e. □ 

Theorem 4.12. Given a max/minPPS, x = P{x), and given e > 0, we can compute an e-optimal 
policy for x = P{x) in time poly(\P\, log (1/e)) 

Proof. First we use the algorithms from [14J to detect variables Xi with q* = or q* = 1 in time 
polynomial in \P\. Then we can remove these from the max/minPPS by substituting the known 
values into the equations for other variables. This gives us an max/minPPS with least fixed point 
< q'* < 1 and does not increase \P\. To use either Theorem 14.81 or Theorem 14. 7| it suffices to 
have a y with y < q* with q* — y < 2~ 14 l p l _3 e. Theorem 13.211 says that we can find such a y in time 
polynomial in \P\ and 14|P| — log (e), which is polynomial in \P\ and log (1/e) as required. Now 
depending on whether we have a maxPPS or minPPS, Theorem 14.81 or Theorem 14.71 show that from 
this y, we can find an e-optimal policy for the max/minPPS with < q'* < 1 in time polynomial 
in |P| and log (1/e). All that is left to show is that we can extend this policy to the variables Xi 
where q* = or q* = 1 while still remaining e-optimal. 
We next show how this can be done. 

For a minPPS, if q* = 1 then for any policy a, (q^)i = 1 so the choice made at such variables Xi 
is irrelevant. Similarly, for maxPPSs, when q* = 0, any choice at X{ is optimal. 

For a minPPS with q* = 0, if Pi{x) has form M, we can choose any variable Xj with q* = 0. 
There is such a variable: if Pi{x) = min {xj, x^} and q* = then either g* = or q^ = 0. Let a be 
a policy such that for each variable Xi with q* = 0, (q*) a (i) = 0. We need to show that {q%)i = 
for all such variables. Suppose that, for some k > 0, {P£(0))i = for all Xi such that q* = 0. Then 
P(P*(0))j = for all Xi with q* = 0. 

To see why this is so, note that whether or not Pi(z) = depends only on which coordinates of 
z are 0, and furthermore if Pi{z) = when the set of coordinates of z is S, then for any vector 
z' where the coordinates of z' are S' 2 S, we have Pi(z') = 0. Since the coordinate S that are 
in q* are a subset of the coordinates S' that are in P*(0), and we have P 4 (g*) = q* = 0, we thus 
have P(P*(0))j = 0. 

If Pi{x) = min {xj,Xk} and q* = then either q*- = or ^ = 0. Suppose w.l.o.g. that 
{Pa{x))i = Xj. Then q* = 0, so by assumption (P^(0))j = and so (P CT (P^(0)))i = 0. We now 
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have enough for (P^ +1 (0))j = for each variable Xj with q* = 0. P^(0) = 0, so by induction for all 
k > 0, (P^(0))j = for all Xj with = 0. From this, for each variable Xj with q* = 0, = 0. 

The case of a maxPPS that have variables with q* = 1 is not so simple. The P-time algorithm 
given in |14| to detect vertices with q* = 1, produces a partial randomised policy for such vertices 
(Lemma 12 in |14j). A randomised policy is a map p : M — > [0, 1], that turns a maxPPS x = P(x) 
into a PPS x = P p {x) by replacing equations of form M, Pj(x) = max {xj,Xfc}, with equations 
of form L Pj(x) = p(i)xj + (1 — p(i))xk- We would prefer a non-randomised (pure) policy a with 
(g*)j = 1 for all variables Xj with q* = 1. Theorem 12.51 (which quotes Theorem 2 of |14j ) guarantees 
the existence of such a ex. 

We can construct such a pure optimal partial policy. We start with P( )(x) = P{x). Given an 
Xi with (P(;)(x))j = max {xj,Xk} and (q*^)i = 1, we try setting (P(; + i)(x))j = Xj and see if this 
gives (q% +1 \)i = 1- If it does then set (P(; + i) (x)))j = Xj. If it does not then set (P(; + i) (x)))j = x/%. 
We can argue inductively that the LFP q^ of x = P^(x) is equal to the LFP q* of x = -P(x) 
for all L The basis, / = 0, is clear. For the induction step, we know from Theorem 12.51 that 
there is an optimal policy a for the maxPPS x = P^(x). If a does not have a{i) = j then 
a(i) = k. So if setting (P(/ +1 )(x))j = Xj would not give (q% +1 ^)i = 1 then (P(; +1 )(x))j = x^ does 
give (q^ l+1 ^)i = 1. We have that (q^ l+1 ^)r = (Q^)r when r ^ « so = 9(2)- When there are no 

Xj with (P^)(x))j = max {xj,X£.} and = 1, we have found a pure partial optimal policy for 

Xj with q* = 1. This requires no more than n calls to the polynomial time algorithm given in |14j 
for determining for a maxPPS, x = P(x) those coordinates i such that q* = 1. □ 

5 Approximating the value of BSSGs in FNP 

In this section we briefly note that, as an easy corollary of our results for BMDPs, we can obtain 
a TFNP (total NP search problem) upper bound for computing (approximately), the value of 
Branching simple stochastic games (BSSG), where the objective of the two players is to maximize, 
and minimize, the extinction probability. For relevant definitions and background results about these 
games see |14j . It suffices for our purposes here to point out that, as shown in |14| . the value of these 
games (which are determined) is characterized by the LFP solution of associated min-maxPPSs, 
x = P(x), where both min and max operators can occur in the equations for different variables. 
Furthermore, both players have optimal policies (i.e. optimal pure, memoryless strategies) in these 
games (see |14j). 

Corollary 5.1. Given a max-minPPS, x = P(x), and given a rational e > 0, the problem of 
approximating the LFP q* of x = P(x), i.e., computing a vector v such that \\q* — vWoq < e, is 
in TFNP, as is the problem of computing e-optimal policies for both players. (And thus also, the 
problem of approximating the value, and computing e-optimal strategies, for BSSGs is in FNP.) 

Proof. Given x = P(x), whose LFP, q*, we wish to compute, first guess pure policies a and r for 
the max and min players, respectively. Then, fix a as max's strategy, and for the resulting minPPS 
(with LFP q*) use our algorithm to compute in P-time an approximate value vector v a , such that 
\\v a — qr* < e/4. Next, fix r as min's strategy, and for the resulting maxPPS (with LFP q*), use 
our algorithm to compute in P-time an approximate value vector v T , such that \\v T — q*\\oo < e/4. 
Finally, check whether \\v a — w r ||oo < e/4- If not, then reject this "guess". If so, then output a 
and r as e-optimal policies for max and min, respectively, and output v := v a (or v := v T ) as 
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an e-approximation of the LFP, q*. This procedure is correct because if q* is the LFP of the 
min-maxPPS, x = P(x), then g* < q* < q*, and thus: 




v, 



CT 1 1 OO 



< Ik* - q, 

< Wlr ~ 9 

^ I I * 




- v, 



(7 ||00 



'* II 



i II * 
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+ IK - 9, 
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+ II - v < 
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CT || OO 



< e 



And likewise for v T . 



□ 



It is worth noting that the problem of approximating the value of a BSSG game, to within a 
desired e > 0, when e is given as part of the input, is already at least as hard as computing the 
exact value of Condon's finite-state simple stochastic games (SSGs) [5], and thus one can not hope 
for a P-time upper bound without a breakthrough. In fact, it was shown in |14| that even the 
qualitative problem of deciding whether the value q* = 1 for a given BSSG (or max-minPPS), which 
was shown there to be in NPflcoNP, is already at least as hard as Condon's quantitative decision 
problem for finite-state simple stochastic games. (Whereas for finite-state SSGs the qualitative 
problem of deciding whether the value is 1 is in P-time.) 

A Omitted material from Section 2 
A.l Proof of Lemma 12.121 

As usual, we always assume, w.l.o.g., that any MPS or PPS is in SNF form. Recall that for a square 
matrix A, p{A) denotes its spectral radius. 

Lemma 12.121 Given a PPS, x = P(x), with LFP q* > 0, if < y < q* , and y < 1, then 
p(P'(y)) < 1, and (I — P'(y)) -1 exists and is non-negative. 

We first recall several closely related results established in our previous papers. Recall that 
a PPS, x = P(x), is called strongly connected, if its variable dependency graph H is strongly 
connected. 

Lemma A.l. (Lemma 6.5 of JiffiFl Let x = P(x) be a strongly connected PPS, in n variables, 
with LFP q* > 0. For any vector < y < q* , p{P'{y)) < 1, and thus (I — P'(y))~ l exists and is 
nonnegative. 

Theorem A. 2. (Theorem 3.6 of /IT]/) For any PPS, x = P(x), in SNF form, which has LFP 
< q* < 1, for all < y < q* , p(P'(y)) < 1 and (L — P'(y)) -1 exists and is nonnegative. 

Proof of Lemma \2.1SX Consider a PPS, x = P{x), with LFP q* > 0, and a vector < y < q* , such 
that y < 1. Note that all we need to establish is that p{P'{y)) < 1, because it then follows by 
standard facts (see, e.g., |17| ) that (I — P'(y)) -1 exists and is equal to Yli^oi^' \v)f — 0- 

Let us first show that if x = P{x) is strongly connected, then p{P'{y)) < 1. To see this, note 
that if x = P(x) is strongly connected, then every variable depends on every other, and thus if there 

4 Lemma 6.5 of |15| is actually a more general result, relating to strongly connected MPSs that arise from more 
general RMCs. 
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exists any i £ {1, . . . , n} such that q* < 1, then it must be the case that for all j £ {1, . . . , n}, we 
have q* < 1. Thus, either q* = 1, or else < q* < 1. If q* = 1, then since y < 1, we have y < q* , 
and thus, by Lemma I A. 11 we have p(P'(y)) < 1. If, on the other hand, < g* < 1, then since 
< y < <7*, by Theorem I A.2|, we have p(P'(y)) < 1. 

Next, consider an arbitrary PPS, x = P(x), that is not necessarily strongly connected. Recall 
the variable dependency graph H of x = P(x). We can partition the variables into sets Si,...,Sk 
which form the SCCs of H. Consider the DAG, D, of SCCs, whose nodes are the sets Si, and for 
which there is an edge from Si to Sj iff in the dependency graph H there is a node i' £ Si with an 
edge to a node in f £ Sj. 

Consider the matrix P'(y). Our aim is to show that p{P'(y)) < 1. Since we assume q* > 0, 
< y < q* , and y < 1, it clearly suffices to show that p(P'(y)) < 1 holds in the case where we 
additionally insist that y > 0, because then for any other z such that < z < y, we would have 
p(P'(z)) < p{P'{y)) < 1. 

So, assuming also that y > 0, consider the n x n-matrix P'(y). To keep notation clean, we 
let A := P'{y)). For the n x n matrix A, we can consider its underlying dependency graph, H = 
({1, . . . , n}, Eh), whose nodes are {1, . . . , n}, and where there is an edge from i to j iff Ai j > 0. 
Notice however that, since y > 0, this graph is precisely the same graph as the dependency graph 
H of x = P{x), and thus it has the same SCCs, and the same DAG of SCCs, D. Let us sort the 
SCCs, so that we can assume S\,. . . ,Sk are topologically sorted with respect to the partial ordering 
defined by the DAG D. In other words, for any variable indices i G S a and j' £ S^ if (i,j) £ Eh, 
then a <b. 

Let S C {l,...,n} be any non-empty subset of indices, and let A[S] denote the principle 
submatrix of A defined by indices in S. It is a well known fact that < p(yl[5]) < p(A). (See, e.g, 
Corollary 8.1.20 of [IT].) 

Since A > 0, p{A) is an eigenvalue of A, and has an associated non-negative eigenvector v > 0, 
j; / (again see, e.g., Chapter 8 of |17|). In other words, 

Av = p(A)v 

Firstly, if p(A) = 0, then we are of course trivially done. So we can assume w.l.o.g. that p(A) > 0. 
Now, if Vi > 0, then for every j such that (j,i) £ Eh, we have (Av)j > 0, and thus since (Av)j = 
p(A)vj, we have Vj > 0. Hence, repeating this argument, if Vi > then for every j that has a path 
to i in the dependency graph H, we have Vj > 0. 

Since v 7^ 0, it must be the case that there is exists some SCC, S c , of H such that for every 
variable index i £ S c , Vi > 0, and furthermore, such that c is the maximum index for such an SCC 
in the topologically sorted list S\, . . . , Sk, i.e., such that for all d > c, and for all j £ Sj, we have 

First, let us note that it must be the case that S c is a non-trivial SCC. Specifically, let us call 
an SCC, S r of H trivial if S r = {i} consists of only a single variable index, i, and furthermore, such 
that = (A)i = {P'{y))i, i.e., that row i of the matrix A is all zero. This can not be the case for 
S c , because for any variable i £ S c , we have Vi > 0, and thus (Av)i = p(A)vi > 0. 

Let us consider the principal submatrix A[S C ] of A. We claim that ^(AfSy) = p(A). To see why 
this is the case, note that Av = p(A)v, and for every i £ S c , we have (Av)i = ^2jCLi t jVj = p{A)vi. 
But Vj = for every j £ Sd such that d > c, and furthermore a^ j = for every j £ Sd 1 such that 
d' < c. 

Thus, if we let vs c denote the subvector of v corresponding to the indices in S c , then we have just 
established that ^4[S' c ]i)5 c = p{A)vs c , and thus that ^^[Sy) > p(A). But since A[S C ] is a principal 
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submatrix of A, we also know easily (see, e.g, Corollary 8.1.20 of |17j). that p( J 4[5' c ]) < p(A), so 
p(A[S c ])=p(A). 

We are almost done. Given the original PPS, x = P(x), for any subset S C {1, . . . , n} of variable 
indices, let xs = Ps(xs, %D S ) denote the subsystem of x = P(x) associated with the vector xs of 
variables in set S, where xd s denotes the variables not in S. 

Now, note that xs c = PsS x S c ^yD Sc ) ls itself a PPS. Furthermore, it is a strongly connected PPS, 
precisely because S c is a strongly connected component of the dependency graph H, and because 
y > 0. Moreover, the Jacobian matrix of Ps c (xs c ,yD Sc ))i evaluated at ys c , which we denote by 
P' s (y), is precisely the principal submatrix A[S C ] of A. Since xs c = Ps c (xs c ,yD Sc ) i s a strongly 
connected PPS, we have already argued that it must be the case that piP'g (y)) < 1. Thus since 
P's (v) = A[S C ], we have p(A[S c ]) = p(A) < 1. This completes the proof. □ 

B Omitted Material from Section 3 
B.l Proof of Theorem IOT1 

Theorem 13.211 Given any max/minPPS, x = P{x), with LFP < q* < 1. If we use the "rounded- 
down- GNM" algorithm with rounding parameter h = j + 2 + A\P\, then the iterations are all defined, 
and for every k > we have < x^ < q* , and furthermore after h = j + 2 + 4|P| iterations we 
have: 



We prove this using a few lemmas. 

Lemma B.l. If we run the rounded- down- GNM starting with x^ := on a max/minPPS, x = 
P(x), with LFP q* , < q* < 1, then for all k>0, x^ is well-defined and < x^ < q* . 

Proof. The base '°' = is immediate for both. 

For the induction step, suppose the claim holds for k and thus < x^ < q* . From Proposition 
13.71 I(x^) is well-defined and I(x^) < q* . Furthermore, since x^ k+1 ^ is obtained from I(x^) 
by rounding down all coordinates, except setting to any that are negative, and since obviously 
q* > 0, we have that < x^ k+1 "> < q*. 

□ 

Lemma B.2. For a max/minPPS, x = P{x), with LFP q* , such that < q* < 1, if we apply 
rounded-down- GNM with parameter h, starting at x^ := 0, then for all j' > 0, we have: 



x x 



(i'+i)|| < 2~i' + 2-^+ 1+4 l p l 



Proof. Since x^ := 0: 



* (0) = q* < 1 < — (1 - q*) (17) 



For any k > 0, if q* — x^ < A(l — q*), then by Proposition [3]7^ which was proved separately for 
maxPPSs and minPPSs, in Lemmas 13 . 1 1 1 and 13 . 1 8 1 respectively), we have: 
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g* -/(*<*>) <(^)(l-g*) (18) 
Observe that after every iteration k > 0, in every coordinate i we have: 

xf ] >I{x (k ~%-2- h (19) 

This holds simply because we are rounding down I(a;( fe_1 ))j by at most 2~ h , unless it is negative in 
which case xf^ = > I{x^ k ^)i. Combining the two inequalities (|18p and (|19p yields the following 
inequality: 

q* - x^ < A(l - q*) + 2-H < d + _ 2 "" )(1 - g*) 

Z /, yi. q J m in 

Taking inequality (fT71) as the base case (with A = (ir^py— ), by induction on k, for all k > 0: 

g* - x^ < (2- k + £ 2-^) \ (1 - g*) 
i=0 ^ ^ ' mm 

But Y k n 2-( h+l ) < 2- ft+1 and I' 1 " 9 * 1 ' 00 < — < 24|P| , b Y Lemma EMI Thus: 

^*=0 - (1-9 )min — (1-9 Jmin ~ ' J ' ' 

< + 2 -h+l )2 A\P\ 1 

Clearly, we have q* — x^ > for all k. Thus we have shown that for all k > 0: 

_ xi^W^ < (2- k + 2^+!)2 4 l p l = 2- k + 2- ft + 1 + 4 l p l. 

□ 

Proof of Theorem I3T2T1 In Lemma E2 let f := j + 4|P| + 1 and h := j + 2 + 4|P|. We have: 
_ x (i+2+4|P|)|| oo < 2 -( j+ i+4|P|) + 2 -(i+l) < 2-W+i) + 2-(j +1 ) = 2-J. □ 



C Omitted Material from Section 4. 
C.l Bounds on the norm of (/ — P'(x))~ l . 

We aim to prove Theorem 14. 6[ which we re-state here. Let us first recall some definitions related to 
the dependency graph of variables in a PPS. 

For a PPS, x = P{x) with n variables, its variable dependency graph is defined to be the digraph 
H = (V,E), with vertices V = {x\, . . . ,x n }, such that (xi,Xj) E E iff in Pi(x) = X^rGi? PrX v ^ ar ^ 
there is a coefficient p r > such that v(a r )j > 0. Intuitively, (x{,Xj) £ £7 means that X( "depends 
directly" on Xj. A MPS or PPS, x = P(x), is called strongly connected if its dependency graph 
H is strongly connected. 

Theorem [OB If x = P(x) is a PPS with LFP q* > then 
(i) If q* < 1 and < y < 1 7 f/ien (7 — P'(^(y + (7*))) _1 exists and is non-negative, and 

~ P'{\(y + q*)))- l \\oo < 2 w \ p \max {2(1 - y)^, 2^1} 



32 



(ii) If q* = 1 and x = P(x) is strongly connected (i.e. every variable depends on every other) and 
< y < 1 = q* , then (I — P'{y))~ l exists and is non-negative, and 

||(J-P / (y))- 1 ||oo<2 4 l p l 



(1 - y)min 

Before proving this Theorem, we shall need to develop some more definitions and lemmas. 

Definition C.l. A path in the dependency graph H = (V,E) of a PPS x = P(x) is a sequence of 
variables x^, ■■■ ,Xk m , with m > 2, such that (x^., x^ i+1 ) € E, for i G {1, ... ,m — 1}. In other 
words, for each i £ {1, . . . ,m — 1}, Xk i+1 appears (with a non-zero coefficient) in the polynomial 
P ki (x). 

We say that Xi depends on Xj (directly or indirectly) if there is a path in the dependency graph 
starting at x% and ending at Xj . 

We shall need to be more quantitative about dependency: 

Lemma C.2. Given a PPS x = P(x) in SNF form, and variables Xi,Xji 

(i) If Xi depends on Xj then there is a positive integer k, with 1 < k < n, such that 

(P'(l)% > 2- |P| 

(ii) If (P'(l) k )ij > for some positive integer k, with 1 < k < n, then Xi depends on Xj. 

(Hi) If Xi depends on Xj "only via variables of Form L", i.e., if there is a path x^, . . . , Xi m in 
the dependency graph such that l\ = i and l m = j, and such that for each 1 < h < m — 1, 

x h = P l h ( x ) = Plhfi + 52g=iPl h ,9 x 9 has f orm L with PhU+i > °> then there is al < k < n 
such that, for any vector x, such that < x < 1, 

(P'(x)% > 2-\ p \ 



Proof. 

(i) Let the sequence of variables X[ L , . . . , X[ k constitute a shortest path from Xi and Xj , such that 
k > 2. Such a shortest path exists, since Xj depends on Xj. So Xi = x\ x , and Xj = xi k , 
and xi h+1 appears in the expression for Pi h (x), and 1 < h < k — 1. Note that we must have 
k < n. Thus (P'(l))i h i h+1 > for 1 < h < k — 1. But note that since -P'(l) is a non-negative 

matrix, > \^h=i{P' {^-))i h i h+1 ■ Since we have chosen a shortest (non-empty) path 

from Xi to Xj, and since x = P(x) is in SNF form, each (P'(l))i h i h+1 that is not exactly 1 
must be a distinct rational coefficient in P, not appearing elsewhere along the path, and thus 

nti(ni)w> 2HP| - 

(ii) For k > 1, we can expand (P'(l) k )ij into a sum of n k ~ l terms of the form ]^[^ =1 (P / 0-))l h l h+1 
with l\ = i, lk+i = j and (I2, ■ ■ ■ ,h) £ {1, n} fe_1 . At least one of these has Ylh = i{P' 0-))i h i h+1 > 
0. In that case, x^, Xh k+1 is a path in the dependency graph starting at Xj and ending at 

Xj. 
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(iii) Let us choose x^, . . . ,xu to be a shortest path from X{ to Xj, with k > 2, and such that 
every equation xi h = Pi h {x) along the path, for all h 6 {1, . . . , k — 1} has form L. Clearly, we 
must have k < n. By monotonicity of P'(z) in z > 0, we have (P'(l)* :— > P'(x) fe_1 . Fur- 
thermore, since x^, . . . ,xi k is a path from x% to Xj, we have {P'{x))\ J 1 > T\h=i(P' { x ))l h lh+i ■ 
Moreover, since each equation x\ h = P{x)\ h has Form L, for every h G {1, . . . , k — 1}, we must 
have (P' (x))i h i h+1 = (P'(l))i h i h+1 (because all the partial derivatives of linear expressions are 
constants). But we argued in (i) that, when . . . , xu constitutes a shortest path from Xi 

to Xj ,n k h z{(p'(i))i h i h+1 >2-\ p i 

□ 

We need a basic result from the Perron-Frobenius theory of non-negative matrices. We are 
not aware of a source that contains a statement exactly equivalent to (or implying) the following 
Lemma, so we shall provide a proof, however it is entirely possible (and likely) that such a Lemma 
has appeared elsewhere. Lemma 19 of |13] provides a similar result for the case when the matrix A 
is irreducible. 

Lemma C.3. If A is a non-negative matrix, and vector u > is such that Au < u and ||ti||oo < 1> 
and a, f3 £ (0, 1) are constants such that for every i € {1, ...n}, one of the following two conditions 
holds: 

(I) (Au)i < (1 - p)<m 

(II) there is some k, 1 < k < n, and some j, such that (A k )ij > a and (Au)j < (1 — j3)uj. 
then (I — A) is non-singular and 



||(z- A)- 1 ^ < 



n 



Proof. First, suppose that some % € {1, ...,n}, satisfies condition (7). Then, we claim that it 
satisfies condition (II), except that we must take k = 0. Specifically, if we let k = 0, then since 
A = 7, and (A ) a = la = 1 > a, condition (II) boils down to (Au)i < (1 — 0)ui. So, to prove the 
statement, it suffices to only consider condition (II) but to allow k = in that condition. 
So, by assumption, given any i E {1, ...n}, there is some < k < n and some j, such that 

(A h )ij >a>0 (20) 

and moreover (Au)j < (1 — (3)uj, which we can rewrite as: 

uj - {Au)j >/3uj(>0) (21) 
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Let u m i n = miiij m. We thus have that for every i: 

n-1 

(A n u)i = {u-Y i A\u-Au)) i 

1=0 

< (u- A k (u - Au))i (because A 1 > and (u - Au) > 0) 

n 

= (ui-^2A^(u r -{Au) f ) 
j'=i 

< (m — Aij(uj — (Au)j) (again, because A k > and (iiy — (Au)ji) > for every f) 

< Ui - apUj (by pOj) and (f2TTl ) 

< ui - a(3u min 

< Ui — u m i n a@Ui (recalling that by assumption ||u||oo < 1) 
We have that A n u < (1 — u m i n aj3)u. Of course (1 — n m i n a/3) < 1. So we have that 

A mn u < (1 - u min a(3) m u 
For any integer d > 0, A d u < u. Thus also, for every d > 0, 

< (1 - ii min a/3) L » J it (22) 

We thus have that, as m — > oo, A m u — > 0. Since u > and A > 0, this implies that as m — )• oo, 
A m — > (coordinate- wise), or in other words that lim m _ 5 . 00 ||A m || 0O = 0. This is equivalent to 
saying that the spectral radius p(A) < 1. Let us first recall that this implies that the inverse matrix 
(I - A)- 1 = J2T=o A k >0 exists. 

Lemma C.4. (see, e.g., Theorem 5.6.9 and Corollary 5.6.16) If A is a square matrix with 

p(A) < 1 then (I — A) is non-singular, the series X^fcLo ^ converges, and (I — A)^ 1 = Ylfc=oA k - 

We will use the following easy fact: 

Lemma C.5. If M is a nonnegative n x n matrix, u > is a vector with ||u||oo < 1; an d A > is 
a real number satisfying Mu < Xu then 

IIMIU < — 



oo 

^■min 



Proof. Since M is non-negative, HMU^ is the maximum row sum of M. There is thus an i such 
that 

Halloo = y^raij 
j 



where rriij are the entries of M. For this i: 



Xui > (Mu)i 



= E 

i 

— m »i u 



jj u. mln 

J 

-^lloo^min 



but Ui < 1 giving us I [Ml [do < □ 
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Now we can complete the proof of Lemma IC.3I 



oo oo 

lfc -u 

fc=0 fc=0 
oo 



{I-A)-\ = (Y,A k )u = J> 

k=0 

oo 

< ^(l-u min a/3) L " J u (by m 



k=0 

oo 

n(l - u min af3) m u 

m=0 

1 

n -u 



the last equality holding because the geometric series sum gives Ylrn=oO- ~~ M minQ ; /3) r 
Lemma [C.51 with M := (I — A)^ 1 = Y]'h^-nA k , and A := n — - — 3 , now yields: 



and this completes the proof of Lemma IC.3I □ 

Proof of Theorem 14.61 Before we start to prove cases (i) and (ii) of the Theorem we need to 
develop some more lemmas. 

Proposition C.6. For a PPS, x = P(x), with LFP q* > 0, for every variable X{ either Pj(0) > 
or Xi depends on a variable Xj with Pj(0) > 0. 

Proof. Suppose, for contradiction, that a variable Xi has P«(0) = and depends only on variables 
Xj which have Pj(0) = 0. Then P™(0) = for all n. But P n (0) — > q* as n — > oo (see e.g.,. Theorem 
3.1 from [15J). So q* =0. □ 

The case when all the equations, X{ = Pi(x), are linear has to be treated a little differently, and we 
tackle that first: 

Lemma C.7. If x = P(x) is a PPS that has no equations of form Q, and has LFP q* > 0, then 

||(/-P')" 1 Hoo<n2 2 l p l 
where P' is the constant Jacobian matrix of P(x), (i.e., P' = P'{x) for all x). 

Proof. First, note that P' is a sub-stochastic matrix i.e. P'l < 1. We will now call a variable, 
Xi, leaky, if (P'l)i < 1. Note that since Pi(x) = *Y^l=\Vij x j + Pi,o> this means that (P'l)i = 

v^n dPj(x) _ sr^n , -, 

Note that since q* > 0, it must be the case that for every variable x^, either Xi itself is leaky, 
or Xi depends (possibly indirectly) on a leaky variable Xj. This is because if a variable X{ doesn't 
satisfy this, then q* = 0, which can't be the case. 
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Since the entries of P' are either 0, 1, or coefficients pij from P(x), we see that for every leaky 
variable x h we have that (P'l); = YJj=iPi,j < i 1 ~ 2 ~ |P| ) holdsi 

For any non-leaky variable x r , there is a leaky variable X{ that x r depends on. x r does not 
depend on any variables of form Q. Thus, by Lemma IC.2l (iii), there is a k, 1 < k < n, such that 
{(P') k ) n >2-\ p \. 

We can thus apply Lemma IC.3l with matrix A := P' and vector u := 1, with a := /3 := 2~'~ p ', 
because we have just established that condition (I) of that Lemma applies to leaky variables x,, and 
condition (II) of that Lemma applies to non-leaky variables. Thus Lemma IC.3l give us that 

||(/- Police <(^-) 2 n2 2 ^ 

J-min 

Of course, l m ; n = 1. □ 

We are now ready to prove parts (i) and (ii) of Theorem 14.61 
(i) When q* < 1, we can say something stronger than Proposition IC. 61 

Lemma C.8. For any PPS, x—P(x), with LFP < q* < 1, for any variable Xi either 
(I) the equation Xi = Pi(x) is of form Q, or else Pi(l) < 1. 

(II) Xi depends on a variable Xj, such that Xj = Pj(x) is of form Q, or else Pj(l) < 1. 

Proof. Suppose, for contradiction, that there is a variable x% for which neither (I) nor (II) holds. 
Let Di be the set of variables that Xi depends on, unioned together with {xj} itself. For any vector 
x, consider the subvector xd-, which consists of the components of x with coordinates in Di. We 
can consider the subset of the equations xo i = Pd^x). By transitivity of dependency, Pp^x) 
contains only terms in the variables XD t - So = Pp. (x) = Pq. (x_dJ is itself a PPS. Since by 
assumption neither (I) nor (II) hold for Xi, we have that XDi = Pd^^a) contains no equations 
of form Q and Po i (l) = 1. Since, therefore, Pd^dJ is linear, we can rewrite XD t = Pd^^dJ 
as XD t = Pr> x Di + Pd{(0) and hence (/ — P' D .)xDi = Po i (0). Lemma IC.7I applied to the PPS 
xn i = PDi{xDi) gives us that, in particular, (I — P' D .) is non-singular. Consequently X£> i = Pd^xd^ 
has a unique solution. But we already said that 1 is a solution, Po i (l) = 1, and so q* D . = 1. This 
contradicts q* < 1. So there can be no Xi for which neither (I) nor (II) holds. □ 

To obtain the conclusion of case (i) of Theorem l4.6[ assuming all of the premises of the Theorem's 
statement, we will now aim to use Lemma IC.31 applied to A := P'{\{y + q*), and u := 1 — q*. 

By Lemma IC.81 every variable Xi either depends on a variable, or is itself equal to a variable, 
Xj, such that Xj = Pj(x) is of form Q or Pj(l) < 1. We can clearly assume that such a dependence 
is linear in the sense of Lemma IC.2I (iii) , and thus for any Xi there is a < k < n with (P'(l)%- > 
2~! p ', for some Xj with either Xj = Pj{x) of form Q or Pj(l) < 1. 

We need to show and that for such an xj we have (P'(^(y + q*))(l — q*) < 1 — q*. 

5 This inequality holds because we assume each positive input probability pij is represented as a ratio of positive 
integers in the encoding of x = P(x), and thus 1 — 5Z" =1 ^4 can De represented as a ratio ^ of two positive integers 
where the denominator is b = YYj=i bj- But then (1 — 5ZJ=i IT") — f ^ 1/ TYj=i bj > jpj- 
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For any variable Xj such that Xj = Pj{x) has form Q, we have that Xj = x^x\ for some variables 
k and /. Thus, since = X\ and 9P q^ = Xk, we have that: 

(J y (^(9* + v))(l-0)i = ^(flJ + wKl-flD + ^Cflf + MKl-flJ) 

= l -{{qt + 1) - (1 - y fc ))(l - g?) + \M + 1) " (1 " W))(l " €) 

= |((«S + !)(!- %*) " (1 " " ft*) + (ft* + 1)(1 - rf) " (1 " W)(l " 

= 1(2 - 2g|g* - (1 - yj )(l " ff*) " (1 " I/fc)(l " ft*)) 

< 1(2 - 2g£g* - (1 - y) min ((l - g*) fc + (1 - ?*),)) 

< 1(2 - 2g£g* - (1 - y) min ((l - g*) fe + (1 - g*), - (1 - g*) fe (l - g*),)) 
= (l-g*)-I(l-y) min (l-g*) 

= (i-l(i-y) m in)(i-g*), 

If, on the other hand, Xj has -P/(l) < 1, then Xj = Pj(l) has form L, and, as in the proof of Lemma 
IC.71 and specifically footnote ([5]), we must have 

Pj{l) < l-2-l p l (23) 

We thus have that: 

v/ 1 



(p^ + ^Xi-g*)),- = E^ 1 



i=i 

n 



= C^Pj,i) + Pj,o - (J^Pi,i9*) - Pj,o 

K=l Z=l 
= P^l)-^*) 

< (i _ 2-1*1) -g* (by j23D) 
= (1-5*^.-2-1*1 

< (1 _2-l jP l)(l-g*) J - 

To be able to apply Lemma [C,31 it only remains to show that P'(±(y + g*)))(l - g*) < (1-g*). 
But Lemma 3.5 of [TTJ established that P'(§(1 + g*)))(l - g*) < (1 - g*). Since < y < 1, it follows 
by monotonicity of P'(z) in z that P'(|(y + g*)))(l - g*) < (1 - g*). 

Thus, we can apply Lemma |C.3| by setting A := P'(^(y + g*)), u := (1 — g*), a := 2~" 
/3 := min{i(l — y) m i n , 2 _ l p l}, and we obtain: 

||(/ - P'(l(y + g*)))- 1 ^ < n(l - O^max {2(1 - y)^, 2 l*l}2l p l 
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Recall that, by Lemma EJH (1 - q*) min > 2" 4 I P L Thus 

||(7- P'(±(y + O)) _1 |loo < n2 9 l p lmax{2(l-y) m j n ,2l p l} 

< 2^lmax{2(l- y ) m J n ,2^l} 

We now prove part (ii) of Theorem 14.61 If x = P{x) is strongly connected, then if there is an Xi 
with Xi = Pi(x) of form Q, then every variable depends on it. If there are no such variables, then 
Lemma IC.7I gives that, for any x G R n , \\I - P'0c)||oo < ™2 2 l p l and we are done. So we can assume 
that there is an with Xi = Pi{x) of form Q. We quote the following from |15j : 

Lemma C.9 (see proof of Theorem 8.1 in |15|). If x = P(x) is strongly connected and q* > 0, then 
q* = l iffp(P'(l)) <1. 

P'(l) is a non-negative irreducible matrix. Perron-Frobenius theory gives us that there is a 
positive eigenvector v > 0, with associated eigenvalue p(P'(l)), the spectral radius of -P'(l), i.e., 
such that P'(l)v = p(P'(l))v. But p(P'(l)) < 1 so P'{l)v < v. 

Lemma CIO (cf Lemma 5.9 of (91). < 2l p l. 

^min 

Proof. For any Xi, Xj, there is some 1 < k < n with (P'(l) k )ij > 0. We know that P'(l) k v < v. 
So {P'{l) k ) ijVj < (P'(l) k v)i = p(P'(l)) k Vi < v { . But by Lemma [UJ (ii) , (P'{l) k )ij > 2~l p l So 
?i < 2l p l. There Eire Vi^Vj that achieve V{ — ^min 

and Vj = \\v\loo, so we are done. □ 

We can normalise the top eigenvector, f , so we can assume that ||^||oo — !• Then Vmin 
Consider any equation Xi = Pi{x) = XjXt of form Q (we have already dealt with the case where no 
such equation exists): 

(P'(y)v)i = yjv k + y k Vj 

< Vm ax v k + y max Vj (where y max := max r y r ) 

< (1 - (1 -y)min)(Ufc +Vj) 

= (l-(l-y) mia )(P , (l)v) i 
= (1 - (1 - y) min )p(P'(l))v t 

< (1 - (1 - y) m in)vi (because p(P'(l)) < 1) 

Now we can apply Lemma |C.3[ with A := P'(y), u := v, a := 2 _ ' p ', and /3 := (1 — y) m i n , to obtain 
that: 

Ul-P'iy))- 1 Woo <nv-l(l-y)^J p \ 
Inserting our bound for i> m i n , namely f mm > 2-l p l, yields: 



Ml-P'(v))- 1 \\a < u2 i \ p \{l-y)^ n 
< 2 4 l p l(l-y)" 1 



□ 
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